<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Richard Dillon</title>
    <description>The latest articles on DEV Community by Richard Dillon (@richard_dillon_b9c238186e).</description>
    <link>https://dev.to/richard_dillon_b9c238186e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3849330%2F15d5a6d5-ef2a-430b-9760-3ac77ede5242.png</url>
      <title>DEV Community: Richard Dillon</title>
      <link>https://dev.to/richard_dillon_b9c238186e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/richard_dillon_b9c238186e"/>
    <language>en</language>
    <item>
      <title>Primitive Shifts: The Harness-as-Primitive Shift</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:04:08 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-harness-as-primitive-shift-143j</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-harness-as-primitive-shift-143j</guid>
      <description>&lt;h1&gt;
  
  
  Primitive Shifts: The Harness-as-Primitive Shift
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;External Verification Loops Are Becoming Non-Negotiable Infrastructure&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every few months, the baseline of how AI systems work quietly moves. Engineers who noticed early weren't smarter — they were just paying attention to the right signals. The shift from "AI generates, humans review" to "AI generates within executable constraints" is one of those moves. If your mental model still treats verification as something that happens &lt;em&gt;after&lt;/em&gt; AI output, you're already behind.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is It?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;harness&lt;/strong&gt; is an external verification layer that wraps LLM execution — not prompt engineering, not fine-tuning, but deterministic constraints enforced &lt;em&gt;outside&lt;/em&gt; the model's reasoning loop. The pattern is deceptively simple: the LLM generates output, the harness validates against executable specifications (tests, type checks, physics constraints, domain invariants), a feedback signal loops back, and the LLM iterates until convergence or rejection.&lt;/p&gt;

&lt;p&gt;This inverts the 2023-2024 paradigm where validation happened &lt;em&gt;after&lt;/em&gt; AI output reached humans. Now verification is a &lt;strong&gt;runtime primitive&lt;/strong&gt; that gates AI execution before it ever surfaces.&lt;/p&gt;

&lt;p&gt;The research driving adoption is unambiguous: LLMs cannot reliably self-correct intrinsic reasoning failures without external grounding. The &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Convergent AI Agent Framework (CAAF)&lt;/a&gt; makes this explicit — the "verification gap" is structural, not a capability limitation to be trained away. When an LLM hallucinates incorrect code, no amount of "think step by step" prompting fixes it; only external execution feedback does.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;IACDM methodology&lt;/a&gt; formalizes this as "Interactive Adversarial Convergence" — treating the harness as an adversarial validator that pressure-tests outputs. Production systems like Claude Code's built-in safety checkpoints, detailed in &lt;a href="https://arxiv.org/pdf/2604.14228" rel="noopener noreferrer"&gt;recent architectural analyses&lt;/a&gt;, implement variants of this pattern with execution-based verification loops.&lt;/p&gt;

&lt;p&gt;Here's the mental model shift: the harness isn't scaffolding you remove later. It's the actual product, with the LLM as a component inside it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Flying Under the Radar
&lt;/h2&gt;

&lt;p&gt;Engineers see "add tests" and think they already do this — but harness-as-primitive means tests run &lt;em&gt;during&lt;/em&gt; generation, not after merge. Your CI pipeline catches bugs post-commit; a harness catches them mid-generation, before the code ever exists in your repository. This distinction sounds subtle but changes everything about how AI integrates into development workflows.&lt;/p&gt;

&lt;p&gt;The pattern looks like "just good engineering" rather than a new AI primitive, so it doesn't get labeled or discussed as such. Framework marketing emphasizes agent autonomy and capability benchmarks; verification infrastructure is unglamorous plumbing that doesn't demo well.&lt;/p&gt;

&lt;p&gt;Early adopters discovered it through failure. A &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;2025 study by METR&lt;/a&gt; showed experienced developers using frontier models were &lt;em&gt;measurably slower&lt;/em&gt; despite believing they were faster — the verification gap made them confident and wrong. They trusted model output, shipped bugs, and spent debugging time that exceeded any generation speedup.&lt;/p&gt;

&lt;p&gt;Multi-agent architectures get attention at conferences; single-agent-with-harness quietly outperforms in production. Both &lt;a href="https://cdn.openai.com/pdf/openai-ending-the-capability-overhang.pdf" rel="noopener noreferrer"&gt;OpenAI Codex&lt;/a&gt; and Claude Code run single ReAct loops with heavy external verification, not the multi-agent swarms that dominate research papers.&lt;/p&gt;

&lt;p&gt;The shift is happening inside build systems and CI pipelines, not in prompts or model configs. If you're not touching infrastructure, you're not seeing it happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Try It Today
&lt;/h2&gt;

&lt;p&gt;Here's a production-ready harness implementation that wraps any code generation task with pytest verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# harness.py - Minimal verification harness for LLM code generation
# Requires: anthropic&amp;gt;=0.34.0, pytest&amp;gt;=8.0.0
# pip install anthropic pytest
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;MAX_ITERATIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute pytest against generated code, return (passed, output).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Write the implementation file
&lt;/span&gt;    &lt;span class="n"&gt;impl_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implementation.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;impl_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Write the test file
&lt;/span&gt;    &lt;span class="n"&gt;test_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_implementation.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;test_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Run pytest with captured output
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pytest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--tb=short&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# Hard timeout prevents infinite loops
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_with_harness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;initial_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generate code that passes tests, iterating until success or budget exhaustion.
    Returns (final_code, iterations_used).
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;current_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;initial_code&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TemporaryDirectory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;temp_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MAX_ITERATIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

            &lt;span class="c1"&gt;# First iteration: generate from scratch
&lt;/span&gt;            &lt;span class="c1"&gt;# Subsequent iterations: fix based on test failures
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Write Python code to solve this task:

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

The code will be tested against these tests:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
{test_code}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Output ONLY the implementation code, no markdown fencing."""
            else:
                prompt = f"""The following code failed tests:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
{current_code}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Test output:
{last_test_output}

Fix the code to pass all tests. Output ONLY the fixed implementation code, no markdown fencing."""

            # Generate candidate solution
            response = client.messages.create(
                model=MODEL,
                max_tokens=2048,
                messages=[{"role": "user", "content": prompt}]
            )

            current_code = response.content[0].text.strip()

            # Strip markdown code fences if model included them anyway
            if current_code.startswith("```

"):
                lines = current_code.split("\n")
                current_code = "\n".join(lines[1:-1])

            # Run harness validation
            passed, last_test_output = run_tests(current_code, test_code, work_dir)

            if passed:
                print(f"✓ Tests passed on iteration {iteration}")
                return current_code, iteration
            else:
                print(f"✗ Iteration {iteration} failed, retrying...")

    # Budget exhausted
    raise RuntimeError(f"Failed to generate passing code after {MAX_ITERATIONS} iterations")

# Example usage
if __name__ == "__main__":
    client = Anthropic()

    # The harness spec (your tests) IS the requirement
    test_code = """
from implementation import merge_sorted_lists

def test_basic_merge():
    assert merge_sorted_lists([1, 3, 5], [2, 4, 6]) == [1, 2, 3, 4, 5, 6]

def test_empty_lists():
    assert merge_sorted_lists([], [1, 2, 3]) == [1, 2, 3]
    assert merge_sorted_lists([1, 2, 3], []) == [1, 2, 3]

def test_duplicates():
    assert merge_sorted_lists([1, 2, 2], [2, 3]) == [1, 2, 2, 2, 3]

def test_single_elements():
    assert merge_sorted_lists([1], [2]) == [1, 2]
"""

    task = "Implement merge_sorted_lists(list1, list2) that merges two sorted lists into one sorted list."

    code, iterations = generate_with_harness(client, task, test_code)
    print(f"\nGenerated in {iterations} iteration(s):\n{code}")


```typescript

For TypeScript projects, apply the same pattern with Zod schema validation as the harness:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
typescript&lt;br&gt;
// harness.ts - Schema validation harness for structured generation&lt;br&gt;
// Requires: &lt;a href="mailto:zod@3.23.0"&gt;zod@3.23.0&lt;/a&gt;, @anthropic-ai/&lt;a href="mailto:sdk@0.30.0"&gt;sdk@0.30.0&lt;/a&gt;&lt;br&gt;
// npm install zod @anthropic-ai/sdk&lt;/p&gt;

&lt;p&gt;import Anthropic from "@anthropic-ai/sdk";&lt;br&gt;
import { z } from "zod";&lt;/p&gt;

&lt;p&gt;// Define your domain schema - this IS your harness&lt;br&gt;
const OrderSchema = z.object({&lt;br&gt;
  orderId: z.string().uuid(),&lt;br&gt;
  customerId: z.string().min(1),&lt;br&gt;
  items: z.array(&lt;br&gt;
    z.object({&lt;br&gt;
      sku: z.string().regex(/^[A-Z]{3}-\d{4}$/),&lt;br&gt;
      quantity: z.number().int().positive(),&lt;br&gt;
      unitPrice: z.number().positive(),&lt;br&gt;
    })&lt;br&gt;
  ).min(1),&lt;br&gt;
  // Domain invariant: total must equal sum of (quantity * unitPrice)&lt;br&gt;
  total: z.number().positive(),&lt;br&gt;
}).refine(&lt;br&gt;
  (order) =&amp;gt; {&lt;br&gt;
    const calculatedTotal = order.items.reduce(&lt;br&gt;
      (sum, item) =&amp;gt; sum + item.quantity * item.unitPrice,&lt;br&gt;
      0&lt;br&gt;
    );&lt;br&gt;
    return Math.abs(order.total - calculatedTotal) &amp;lt; 0.01;&lt;br&gt;
  },&lt;br&gt;
  { message: "Total must equal sum of item prices" }&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;type Order = z.infer;&lt;/p&gt;

&lt;p&gt;const MAX_ITERATIONS = 3;&lt;/p&gt;

&lt;p&gt;async function generateWithSchemaHarness(&lt;br&gt;
  client: Anthropic,&lt;br&gt;
  prompt: string&lt;br&gt;
): Promise {&lt;br&gt;
  let lastError = "";&lt;/p&gt;

&lt;p&gt;for (let i = 0; i &amp;lt; MAX_ITERATIONS; i++) {&lt;br&gt;
    const fullPrompt = lastError&lt;br&gt;
      ? &lt;code&gt;${prompt}\n\nPrevious attempt failed validation: ${lastError}\n\nFix the JSON and try again.&lt;/code&gt;&lt;br&gt;
      : prompt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: fullPrompt }],
});

const text = response.content[0].type === "text" 
  ? response.content[0].text 
  : "";

// Extract JSON from response (handle markdown fencing)
const jsonMatch = text.match(/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
/) || 
                      text.match(/(\{[\s\S]*\})/);

    if (!jsonMatch) {
      lastError = "No valid JSON found in response";
      continue;
    }

    try {
      const parsed = JSON.parse(jsonMatch[1]);
      // Harness validation - schema + domain invariants
      const validated = OrderSchema.parse(parsed);
      console.log(`✓ Validation passed on iteration ${i + 1}`);
      return validated;
    } catch (e) {
      if (e instanceof z.ZodError) {
        lastError = e.errors.map((err) =&amp;gt; 
          `${err.path.join(".")}: ${err.message}`
        ).join("; ");
        console.log(`✗ Iteration ${i + 1}: ${lastError}`);
      } else {
        lastError = `JSON parse error: ${e}`;
      }
    }
  }

  throw new Error(`Failed after ${MAX_ITERATIONS} iterations: ${lastError}`);
}

// Usage
const client = new Anthropic();

generateWithSchemaHarness(
  client,
  `Generate a sample e-commerce order as JSON with:
   - A valid UUID for orderId
   - SKUs in format ABC-1234
   - At least 2 items
   - Correctly calculated total`
).then(console.log);


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight from both implementations: your test suite or schema &lt;strong&gt;is&lt;/strong&gt; the specification. &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;BDD/TDD-first workflows&lt;/a&gt; write Gherkin specs or failing tests &lt;em&gt;before&lt;/em&gt; prompting, treating them as the harness signal rather than human review.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Test coverage becomes AI capability.&lt;/strong&gt; Teams with comprehensive test suites get dramatically better AI output; teams without them hit a ceiling no prompt engineering crosses. This isn't a metaphor — the harness literally cannot validate what you haven't specified. &lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;Research on AI agent failures&lt;/a&gt; shows specification completeness directly correlates with generation success rates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI/CD pipelines become AI infrastructure.&lt;/strong&gt; Your existing verification tooling (linters, type checkers, integration tests) is now part of your AI system's runtime, not just your human workflow. The &lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up framework&lt;/a&gt; explicitly positions software engineering guardrails as AI-native infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Vibe coding" produces technical debt faster.&lt;/strong&gt; A &lt;a href="https://arxiv.org/html/2603.28592v2" rel="noopener noreferrer"&gt;large-scale empirical study&lt;/a&gt; found AI-generated code without harness validation accumulated 484,366 distinct issues across 302.6k commits — code smells at 89.3%. The speed advantage of AI generation becomes negative if you're generating bugs faster than you fix them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture decision: harness logic belongs in the orchestration layer.&lt;/strong&gt; Separate "what the AI does" from "what constraints it operates under." &lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Recent analysis of agentic systems&lt;/a&gt; argues this separation is essential for maintainability and auditability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human review shifts from gatekeeping to harness design.&lt;/strong&gt; Engineers spend time writing better constraints, not reviewing more AI output. The &lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt; assumes skills come with verification criteria — skills without validators are incomplete primitives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Signal
&lt;/h2&gt;

&lt;p&gt;The convergent evolution tells the story. &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;CAAF&lt;/a&gt;, &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;IACDM&lt;/a&gt;, the &lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up framework&lt;/a&gt;, and Anthropic's internal practices all independently arrived at "external verification as first-class primitive." When multiple research groups solving different problems converge on the same pattern, it's usually load-bearing.&lt;/p&gt;

&lt;p&gt;The tooling investment pattern is revealing. &lt;a href="https://gist.github.com/spikelab/7551c6368e23caa06a4056350f6b2db3" rel="noopener noreferrer"&gt;Letta's 74% LoCoMo score&lt;/a&gt; came from filesystem-based memory with validation, not sophisticated retrieval — simple harnesses beat complex memory architectures. Platform engineering integration follows: &lt;a href="https://arxiv.org/html/2602.23397v1" rel="noopener noreferrer"&gt;IDPs projected to reach 80% adoption&lt;/a&gt; are natural homes for harness infrastructure, with "golden paths" essentially functioning as pre-validated execution corridors.&lt;/p&gt;

&lt;p&gt;Benchmark evolution provides another signal. &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Terminal-Bench, SWE-bench, and similar evaluations&lt;/a&gt; are &lt;em&gt;harness-native&lt;/em&gt; — they measure agent performance inside verification loops, not raw generation quality. When the benchmarks assume harnesses, the production systems will too.&lt;/p&gt;

&lt;p&gt;The quiet deprecation is already visible in the literature. Prompt-only approaches are being called "&lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;anti-patterns&lt;/a&gt;" in 2025-2026 publications; "unstructured vibe coding" is explicitly positioned as the thing harnesses fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shift Rating
&lt;/h2&gt;

&lt;p&gt;🟢 &lt;strong&gt;Adopt Now&lt;/strong&gt; — Teams without harness infrastructure are already accumulating technical debt faster than they realize. The primitive is production-ready, framework-agnostic, and builds on existing testing/CI investments. The implementation cost is low (you likely have most of the pieces already), and the payoff compounds: better AI output today, less debt tomorrow, and infrastructure that scales as models improve.&lt;/p&gt;

&lt;p&gt;Engineers who internalize "verification is runtime infrastructure, not post-hoc review" will feel the gap close. Those who don't will wonder why their AI tooling plateaued while others kept accelerating.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/spikelab/7551c6368e23caa06a4056350f6b2db3" rel="noopener noreferrer"&gt;A memory architecture for agentic system · GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;Technical Foundation Document IACDM: Interactive Adversarial Convergence Development Methodology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.28592v2" rel="noopener noreferrer"&gt;A Large-Scale Empirical Study of AI-Generated Code in the Wild&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2604.14228" rel="noopener noreferrer"&gt;Dive into Claude Code: The Design Space of Today's and Future AI Coding Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Rethinking Software Engineering for Agentic AI Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cdn.openai.com/pdf/openai-ending-the-capability-overhang.pdf" rel="noopener noreferrer"&gt;Ending the Capability Overhang - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;Where Do AI Coding Agents Fail? An Empirical Study&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;Knowledge Activation: AI Skills as the Institutional Knowledge Primitive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.23397v1" rel="noopener noreferrer"&gt;Lifecycle-Integrated Security for AI-Cloud Convergence&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Harness as an Asset: Enforcing Determinism via the Convergent AI Agent Framework&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of **Primitive Shifts&lt;/em&gt;* — a monthly series tracking when new AI building blocks&lt;br&gt;
move from novel experiments to infrastructure you'll be expected to know.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Next MCP Watch series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Spotted a shift happening in your stack? Drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Agentic Memory Systems — From Chaotic Context to Learned Control</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:03:37 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/agentic-memory-systems-from-chaotic-context-to-learned-control-183o</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/agentic-memory-systems-from-chaotic-context-to-learned-control-183o</guid>
      <description>&lt;h1&gt;
  
  
  Agentic Memory Systems — From Chaotic Context to Learned Control
&lt;/h1&gt;

&lt;p&gt;Your agent just failed a customer support escalation because it couldn't remember that this same user had already explained their billing issue twice in previous sessions. The context window filled up with tool calls and intermediate reasoning, and the critical historical context got evicted. This isn't a rare edge case—it's the default failure mode for any agent that runs longer than a single conversation turn. The 2024-era solutions of naive RAG retrieval and sliding window compression treat memory as passive storage, but production agents need something fundamentally different: the ability to &lt;em&gt;decide&lt;/em&gt; what to remember.&lt;/p&gt;

&lt;p&gt;The research wave from early 2026 has crystallized around a compelling answer. Papers on &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;agentic memory architectures&lt;/a&gt; and benchmarks like MemoryArena have demonstrated that treating memory operations as learnable actions—not hardcoded heuristics—recovers 15-25% accuracy on multi-session tasks where even the best models were failing. This shift from "memory as database" to "memory as learned skill" represents the most significant architectural evolution in agent design since tool use became standard.&lt;/p&gt;

&lt;p&gt;This article breaks down the four-memory-type architecture emerging as the production standard and shows you how to implement learned memory policies in LangGraph with the new &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB integration&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four-Memory-Type Architecture for Agents
&lt;/h2&gt;

&lt;p&gt;The cognitive science literature has long distinguished between different memory systems in humans, and this taxonomy turns out to be remarkably useful for agent design. The &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;survey on memory mechanisms for autonomous agents&lt;/a&gt; identifies four distinct memory types that map directly to different operational needs in production systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working memory&lt;/strong&gt; is what the agent is thinking right now—the live reasoning context held in the current LLM call. It's bounded by your context window (128K tokens for Claude, up to 2M with Google's models), and everything flows through it. The critical insight is that working memory isn't just the user's message; it's the curated subset of all other memory types that's been loaded for this specific reasoning step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Episodic memory&lt;/strong&gt; stores timestamped records of specific interactions and events. When a user asks "what did we discuss last week about the API migration?" the answer lives in episodic memory. Each episode captures not just what was said, but the outcome—did the user seem satisfied? Did the suggested solution work? This outcome tracking is what enables learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic memory&lt;/strong&gt; contains consolidated facts and rules extracted from episodes. If a customer support agent handles fifty return requests, the episodes are individual conversations, but the semantic memory extracts "customers mentioning 'damaged in shipping' are eligible for express replacement without requiring photos." This generalization is what prevents agents from repeatedly discovering the same patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedural memory&lt;/strong&gt; stores action sequences and workflows as reusable routines. When an agent learns that processing a refund requires checking order status, then verifying payment method, then initiating the return, this becomes procedural knowledge that can be invoked without re-reasoning from first principles.&lt;/p&gt;

&lt;p&gt;The interaction patterns matter as much as the types themselves. Episodic memory consolidates into semantic memory through generalization—after enough similar episodes, a pattern becomes a fact. Procedural and semantic memory load into working memory during task execution, providing the context needed for reasoning. The &lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;architectural taxonomies&lt;/a&gt; emerging in the literature consistently show this hierarchical flow: episodes → facts → working context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Type&lt;/th&gt;
&lt;th&gt;Persistence&lt;/th&gt;
&lt;th&gt;Update Frequency&lt;/th&gt;
&lt;th&gt;Typical Backend&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Single call&lt;/td&gt;
&lt;td&gt;Every token&lt;/td&gt;
&lt;td&gt;LLM context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Per interaction&lt;/td&gt;
&lt;td&gt;Document store, MongoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Periodic consolidation&lt;/td&gt;
&lt;td&gt;Vector store, graph DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedural&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Rare refinement&lt;/td&gt;
&lt;td&gt;Code/config, document store&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  From Passive Storage to Learned Memory Policies
&lt;/h2&gt;

&lt;p&gt;The traditional approach to agent memory is entirely heuristic. Summarize every N turns. Retrieve the top-K similar chunks. Compress anything older than M messages. These rules are easy to implement and easy to reason about, but they fail at the edges where production systems actually live.&lt;/p&gt;

&lt;p&gt;Over-summarization loses critical detail. A summary that says "user discussed billing issues" isn't useful when the specific detail was that the user's card was charged twice on March 3rd for transaction ID 4829. Under-retrieval causes agents to repeat mistakes or ask users to re-explain problems they've already described. The heuristics don't know what matters for the current task.&lt;/p&gt;

&lt;p&gt;The breakthrough in the &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;agentic memory research&lt;/a&gt; is treating memory operations—store, retrieve, consolidate, forget—as actions in a reinforcement learning framework. Instead of hardcoding "summarize every 10 turns," you train the agent to decide when summarization helps and when it hurts. The training signal comes from downstream task success: did remembering this detail lead to a correct answer? Did consolidating those episodes produce a useful generalization?&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/Shichun-Liu/Agent-Memory-Paper-List" rel="noopener noreferrer"&gt;agent memory paper list&lt;/a&gt; catalogs the rapid evolution of these techniques. Step-wise policy gradient methods like GRPO allow fine-grained credit assignment—which specific memory decision contributed to the final outcome? This is fundamentally different from end-to-end training because memory decisions have delayed effects; storing something now might only prove useful three sessions later.&lt;/p&gt;

&lt;p&gt;Benchmark results from MemoryArena illustrate the gap. Models that achieve near-perfect scores on single-session long-context tasks (LoCoMo-style benchmarks) drop to 40-60% accuracy on multi-session tasks with interdependencies. The context window is long enough, but the agent can't figure out what to load from history. Learned memory policies recover 15-25% of this accuracy gap—not by expanding context, but by making smarter decisions about what goes into it.&lt;/p&gt;

&lt;p&gt;The operational gotcha is that learned policies require task-specific fine-tuning. An off-the-shelf model won't magically know what to remember for your customer support workflow versus your code review assistant. Until you've collected enough trajectories to train on, you need explicit memory scaffolding—which brings us to implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;We'll build a LangGraph agent that maintains episodic and semantic memory across sessions using MongoDB as the backend. This architecture leverages the &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB integration&lt;/a&gt; announced for production agent deployments. The goal is a working memory system you can deploy today with heuristic policies, structured for easy upgrade to learned policies later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.mongodb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoDBSaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Define memory schemas with Pydantic
# These schemas determine what we track in each memory type
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A single interaction event with full context and outcome.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# Natural language summary of what happened
&lt;/span&gt;    &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Extracted entities (names, IDs, topics)
&lt;/span&gt;    &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# The original user input
&lt;/span&gt;    &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# What the agent said
&lt;/span&gt;    &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# success, failure, unknown
&lt;/span&gt;    &lt;span class="n"&gt;outcome_signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# Numeric reward for RL training
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A consolidated fact extracted from one or more episodes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# The actual fact, e.g., "User prefers email over phone"
&lt;/span&gt;    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# How certain we are
&lt;/span&gt;    &lt;span class="n"&gt;source_episode_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Provenance for debugging
&lt;/span&gt;    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;last_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# For LRU-style eviction
&lt;/span&gt;    &lt;span class="n"&gt;use_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Track utility for learned policies
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The state object passed through the LangGraph nodes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;  &lt;span class="c1"&gt;# Conversation history (working memory)
&lt;/span&gt;    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;memory_action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# What the controller decided
&lt;/span&gt;    &lt;span class="n"&gt;consolidation_pending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Memory storage layer using MongoDB
# Separate collections for episodes and semantic facts
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_facts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Create indexes for efficient queries
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist an episode to MongoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve relevant episodes for a user.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$in&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_recent_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count episodes mentioning an entity—used for consolidation trigger.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_documents&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_semantic_fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist a semantic fact to MongoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_semantic_facts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve high-confidence facts for context loading.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Define the memory controller node
# This is where heuristic policy lives—replace with learned policy later
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;memory_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Decide which memory operation to perform based on current state.

    Heuristic policy (v1):
    - Always retrieve relevant episodes and facts before reasoning
    - Store episode after each user turn
    - Trigger consolidation when 5+ episodes share an entity
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract entities from the last user message (simplified)
&lt;/span&gt;    &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="c1"&gt;# In production, use NER or LLM extraction here
&lt;/span&gt;    &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve relevant context
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_facts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_semantic_facts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check if consolidation should trigger
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_recent_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidate_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Placeholder entity extraction—use spaCy or LLM in production.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Very simplified: extract capitalized words and common patterns
&lt;/span&gt;    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b[A-Z][a-z]+\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;))[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Build the reasoning node that uses memory context
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reasoning_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Main reasoning with memory-augmented context.
    Loads retrieved episodes and facts into working memory.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Build system prompt with memory context
&lt;/span&gt;    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_episodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_facts&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;system_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant with access to memory of past interactions.

RELEVANT HISTORY:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Use this context to provide personalized, consistent responses. 
Reference past interactions when relevant.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Call the LLM with augmented context
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_msg&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create episode record for this interaction
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asked about: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add response to conversation
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Format retrieved memories for inclusion in prompt.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KNOWN FACTS ABOUT THIS USER:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;RECENT RELEVANT INTERACTIONS:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;date_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No relevant history found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 5: Consolidation node—extracts semantic facts from episodes
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consolidation_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Consolidate multiple episodes into semantic facts.
    This is where episodic → semantic generalization happens.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Get episodes to consolidate
&lt;/span&gt;    &lt;span class="n"&gt;entity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidate_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Use LLM to extract generalizable facts
&lt;/span&gt;    &lt;span class="n"&gt;episode_summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;extraction_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on these past interactions, extract 1-3 general facts about the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s preferences, patterns, or needs:

INTERACTIONS:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;episode_summaries&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Output each fact on its own line, starting with &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
Only include facts that appear consistently across multiple interactions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;extraction_prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse and store extracted facts
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;fact_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;fact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fact_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Adjust based on episode count
&lt;/span&gt;                &lt;span class="n"&gt;source_episode_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_semantic_fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Step 6: Wire everything into a LangGraph StateGraph
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_memory_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;anthropic_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Construct the full memory-enabled agent graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize components
&lt;/span&gt;    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;anthropic_api_key&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MongoDBSaver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_conn_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Define the graph
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add nodes with bound dependencies
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;memory_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;reasoning_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;store_and_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;consolidation_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Define edges
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Conditional edge: consolidate if pending, else end
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_and_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist the current episode and return state.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Usage example with LangSmith tracing
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceable&lt;/span&gt;

&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_agent_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute a single turn with full tracing.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;initial_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extension point for learned policies is the &lt;code&gt;memory_controller&lt;/code&gt; function. Replace the heuristic rules with a fine-tuned classifier that takes the current state and predicts the optimal memory action. The &lt;a href="https://huggingface.co/blog/aufklarer/ai-trends-2026-test-time-reasoning-reflective-agen" rel="noopener noreferrer"&gt;GRPO training approach&lt;/a&gt; mentioned in the research uses trajectories where you label which memory decisions led to successful task completion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarking Your Agent's Memory: MemoryArena and MemBench
&lt;/h2&gt;

&lt;p&gt;Standard benchmarks for long-context models miss the critical challenge in production agents. LoCoMo, LongBench, and similar evaluations test single-session performance—can the model find a needle in a haystack within one context window? But your production agent runs across dozens of sessions over weeks or months. The &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;survey on memory evaluation&lt;/a&gt; identifies this gap as a primary reason deployed agents underperform their benchmark scores.&lt;/p&gt;

&lt;p&gt;MemoryArena addresses this with four domains specifically designed for multi-session evaluation: customer support (returning users with ongoing issues), project management (tasks that span days with status updates), personal assistant (preference learning over time), and collaborative coding (incremental feature development). Tasks span 5-20 sessions with explicit interdependencies—session 7 might require information from session 2 that wasn't relevant in sessions 3-6.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;agentic AI architectures survey&lt;/a&gt; highlights five dimensions for memory evaluation that you should track in your own systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retention accuracy&lt;/strong&gt;: Does the agent remember critical facts after they leave the context window?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval precision&lt;/strong&gt;: When memory is loaded, is it actually relevant to the current query?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation quality&lt;/strong&gt;: Do extracted semantic facts accurately generalize from episodes?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interference resistance&lt;/strong&gt;: Does learning new information corrupt existing memories?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting appropriateness&lt;/strong&gt;: Does the agent correctly discard outdated or superseded information?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For practical measurement in LangSmith, instrument your agent with these metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory hit rate&lt;/strong&gt;: Of retrieved memories, what percentage appeared in the final response or reasoning trace? Track this with metadata tags on your traces.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation ratio&lt;/strong&gt;: Episodes created vs. semantic facts extracted. A ratio of 5:1 (5 episodes per fact) suggests healthy generalization; 2:1 might indicate overfitting to specific instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory bloat&lt;/strong&gt;: Total storage growth per active user per week. Unbounded growth signals missing TTL policies or over-storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To create your own MemoryArena-style evaluation, export multi-session conversation logs from your production system, annotate them with ground-truth "should remember" and "should retrieve" labels, and compare agent performance with memory enabled versus disabled. The &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents&lt;/a&gt; collection includes several evaluation harnesses you can adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Immediate adoption (this week)&lt;/strong&gt;: Add structured episodic logging to your existing agents. Even without learned policies, queryable history improves debugging when something goes wrong and increases user trust when the agent demonstrates continuity. The code above gives you a working MongoDB-backed episodic store you can deploy today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium-term investment (next quarter)&lt;/strong&gt;: Implement the four-memory-type separation. Use MongoDB or PostgreSQL with JSON columns for episodic storage—the &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB partnership&lt;/a&gt; provides native integration. Add a vector store (Pinecone, Weaviate, or MongoDB Atlas Vector Search) for semantic retrieval. The investment pays off in personalization quality and reduced user friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced path (6+ months)&lt;/strong&gt;: Fine-tune a memory controller on your domain using GRPO or DPO. This requires collecting trajectories with labeled outcomes—which memory decisions led to task success? The &lt;a href="https://arxiv.org/html/2602.23720v1" rel="noopener noreferrer"&gt;emerging frameworks like Auton&lt;/a&gt; provide scaffolding for this training loop, but expect to invest in custom data collection infrastructure.&lt;/p&gt;

&lt;p&gt;One critical architecture decision: should memory consolidation run inline (during the conversation) or as a background job? Inline consolidation adds latency—100-500ms for an LLM call to extract facts—but keeps memory fresh. Background batch processing adds staleness (facts extracted hours after the relevant episodes) but maintains conversation responsiveness. For most applications, background consolidation with aggressive episode retrieval is the right trade-off.&lt;/p&gt;

&lt;p&gt;Operational considerations you'll hit in production: memory storage grows unboundedly without intervention. Implement TTL policies (archive episodes older than 90 days to cold storage), user-scoped isolation (critical for multi-tenant systems), and GDPR-compliant deletion hooks (when a user requests data deletion, you need to cascade through episodes, facts, and any derived embeddings).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering" rel="noopener noreferrer"&gt;agentic engineering practices&lt;/a&gt; emerging in production teams emphasize that memory systems are infrastructure, not features. Budget for them accordingly—monitoring, alerting on memory bloat, and regular audits of consolidation quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Memory-Enabled Support Agent with Consolidation Dashboard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a customer support agent that remembers user preferences and issue history across sessions, with a Streamlit dashboard showing memory operations in real-time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy the MongoDB-backed memory system from the code walkthrough&lt;/li&gt;
&lt;li&gt;Create a simple support chat interface (Gradio or Streamlit)&lt;/li&gt;
&lt;li&gt;Simulate 10 multi-turn conversations with 3 different "users," each discussing a recurring topic (billing, technical issues, feature requests)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build a dashboard that displays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Episode timeline per user&lt;/li&gt;
&lt;li&gt;Extracted semantic facts with source episode links&lt;/li&gt;
&lt;li&gt;Memory hit rate per conversation (did retrieved memories appear in responses?)&lt;/li&gt;
&lt;li&gt;Consolidation triggers (when and why facts were extracted)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run a comparison: disable memory retrieval for half the conversations and measure how often the agent asks users to repeat information they've already provided&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The success metric is demonstrating that memory-enabled conversations require fewer clarifying questions and produce more personalized responses. Post your results with LangSmith trace links—the community needs more real-world data on memory system performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;Announcing the LangChain + MongoDB Partnership: The AI Agent Stack That Runs On The Database You Already Trust&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Shichun-Liu/Agent-Memory-Paper-List" rel="noopener noreferrer"&gt;The paper list of "Memory in the Age of AI Agents: A Survey"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/aufklarer/ai-trends-2026-test-time-reasoning-reflective-agen" rel="noopener noreferrer"&gt;AI Trends 2026: Test-Time Reasoning and the Rise of Reflective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Future Directions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.23720v1" rel="noopener noreferrer"&gt;The Auton Agentic AI Framework: A Declarative Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering" rel="noopener noreferrer"&gt;How Swarms of AI Agents Are Redefining Software Engineering&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Infrastructure Strains Under Demand as OpenAI Ships GPT-5.5 and Multi-Agent Systems Go Mainstream</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:03:05 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-infrastructure-strains-under-demand-as-openai-ships-gpt-55-and-multi-agent-systems-go-mainstream-3k35</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-infrastructure-strains-under-demand-as-openai-ships-gpt-55-and-multi-agent-systems-go-mainstream-3k35</guid>
      <description>&lt;h1&gt;
  
  
  AI Infrastructure Strains Under Demand as OpenAI Ships GPT-5.5 and Multi-Agent Systems Go Mainstream
&lt;/h1&gt;

&lt;p&gt;The AI industry is experiencing a fascinating inflection point this week: while chipmakers struggle to meet insatiable demand and Goldman Sachs sounds alarms about long-term market disruption, the technology itself continues its relentless march forward. OpenAI's GPT-5.5 brings enhanced agentic capabilities, interpretable architectures are emerging from stealth, and multi-agent systems are finally transitioning from research curiosity to production necessity. The infrastructure can barely keep up—and that tension is reshaping both the semiconductor industry and investment strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intel CPU Demand Surges as AI Boom Reaches Central Processors
&lt;/h2&gt;

&lt;p&gt;Intel is having a moment. The company's stock &lt;a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/" rel="noopener noreferrer"&gt;hit record highs&lt;/a&gt; this week as AI service providers drove unprecedented demand for traditional CPUs, signaling a significant shift in how the industry thinks about AI infrastructure requirements.&lt;/p&gt;

&lt;p&gt;The numbers tell a compelling story: Q1 demand was so strong that Intel sold through chips originally reserved for other purposes, a remarkable turnaround for a company that spent years watching NVIDIA dominate AI compute headlines. This isn't Intel suddenly competing in the GPU space—it's the AI workload profile evolving to require more heterogeneous compute.&lt;/p&gt;

&lt;p&gt;The surge makes architectural sense. As AI deployments move from training-focused research environments to inference-heavy production systems, the computational mix changes. Retrieval-augmented generation pipelines, vector database queries, orchestration layers for multi-agent systems, and pre/post-processing stages all lean heavily on CPU performance. A single AI service might use GPUs for model inference while relying on dozens of CPU cores for everything surrounding that inference.&lt;/p&gt;

&lt;p&gt;This follows Intel's &lt;a href="https://www.reuters.com/business/google-puts-ai-agents-heart-its-enterprise-money-making-push-2026-04-22/" rel="noopener noreferrer"&gt;partnership with Google&lt;/a&gt; on AI-optimized CPUs announced earlier this year, suggesting the demand spike isn't purely organic but reflects strategic positioning that's now paying dividends. The question is whether Intel can sustain this momentum as competitors adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Samsung Chip Profits Jump 50x Amid AI-Driven Supply Crunch
&lt;/h2&gt;

&lt;p&gt;If Intel's surge represents demand reaching CPUs, Samsung's numbers represent the raw magnitude of that demand across the entire semiconductor stack. The company's semiconductor profits jumped &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;nearly 50-fold&lt;/a&gt; on AI chip demand—a staggering figure that underscores just how supply-constrained the industry remains.&lt;/p&gt;

&lt;p&gt;More concerning for AI builders: Samsung executives warned the supply shortage will &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;worsen through 2027&lt;/a&gt;. That's not a quarter or two of tightness—it's a multi-year structural constraint that will force hard prioritization decisions about which AI projects get built and which wait for silicon.&lt;/p&gt;

&lt;p&gt;The bottleneck extends beyond any single manufacturer. Cerebras is also &lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;targeting AI chip market expansion&lt;/a&gt;, and every major hyperscaler has custom silicon programs in various stages of deployment. Yet demand continues to outpace supply additions.&lt;/p&gt;

&lt;p&gt;For engineering teams, this has practical implications. Reserved capacity agreements, longer hardware procurement timelines, and more aggressive optimization to extract maximum utility from existing infrastructure are becoming standard practice. The companies that locked in capacity contracts 18 months ago are looking prescient; those assuming spot availability are scrambling. Cloud costs reflect this reality, with GPU instance prices remaining stubbornly high despite efficiency improvements in model inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goldman Sachs Warns AI Disruption Threatens Long-Term US Equity Valuations
&lt;/h2&gt;

&lt;p&gt;While chipmakers celebrate demand, Goldman Sachs is &lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;raising concerns&lt;/a&gt; about what that AI adoption means for the broader market. The investment bank's analysis suggests AI's potential to disrupt existing business models creates unprecedented uncertainty in long-term equity valuations.&lt;/p&gt;

&lt;p&gt;The argument isn't that AI is bad for the economy—quite the opposite. It's that traditional valuation frameworks assume reasonable continuity in competitive dynamics, and AI capabilities are advancing fast enough to invalidate those assumptions. A company's moat today might be worthless if an AI system can replicate its core competency tomorrow.&lt;/p&gt;

&lt;p&gt;This creates a valuation puzzle. How do you price a professional services firm when GPT-5.5 can handle increasing portions of its workflows? What's the appropriate multiple for a software company whose product might be replaced by an AI agent? Goldman's analysts argue investors are &lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;rethinking traditional valuation approaches&lt;/a&gt; for companies with significant AI exposure—both positive and negative.&lt;/p&gt;

&lt;p&gt;Reuters' parallel &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;analysis of AI business model reliability&lt;/a&gt; adds context: many AI-native companies themselves have unproven unit economics, making the disruption a two-way uncertainty. The market is effectively pricing both disruption risk for incumbents and execution risk for disruptors simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Unveils GPT-5.5 with Enhanced Cyber Capabilities and Expanded Access
&lt;/h2&gt;

&lt;p&gt;OpenAI's &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5 release&lt;/a&gt; this week represents the company's most significant push into agentic and cyber-specific capabilities. The model scored &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;84.9% on the GDPval benchmark&lt;/a&gt;, which tests agent performance across 44 distinct occupations—a notable jump that positions it as the current leader in generalist agent capability.&lt;/p&gt;

&lt;p&gt;The cyber focus deserves particular attention. Building on the GPT-5.2 security framework, GPT-5.5 introduces &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;cyber-specific safeguards&lt;/a&gt; designed to prevent misuse while enabling legitimate security research and defense applications. This includes improved jailbreak resistance for security-adjacent prompts and better detection of social engineering attempts that try to extract offensive capabilities.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Trusted Access for Cyber program&lt;/a&gt; expands access to advanced cybersecurity capabilities for vetted organizations. Critical infrastructure defenders can apply for what OpenAI calls "cyber-permissive model access"—essentially a less restricted version of the model for organizations that can demonstrate legitimate defensive needs and accept strict usage requirements.&lt;/p&gt;

&lt;p&gt;This tiered access approach represents OpenAI's attempt to thread the needle between capability and responsibility. The most powerful features are gated behind verification processes, while the broadly available model maintains stronger guardrails. Whether this satisfies critics who want either more restriction or more openness remains to be seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guide Labs Introduces Interpretable LLM Architecture
&lt;/h2&gt;

&lt;p&gt;In a space dominated by scale-focused competition, &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;Guide Labs debuted&lt;/a&gt; a fundamentally different approach: an LLM architecture built from the ground up for transparency and explainability. The startup's design prioritizes interpretability as a first-class architectural concern rather than a post-hoc analysis layer.&lt;/p&gt;

&lt;p&gt;The timing is strategic. Enterprise buyers increasingly demand AI systems they can audit, understand, and explain to regulators. The EU AI Act's requirements for high-risk applications are pushing organizations toward solutions that offer more than black-box predictions with confidence scores. Guide Labs is betting that some enterprises will accept capability tradeoffs for genuine interpretability.&lt;/p&gt;

&lt;p&gt;The architecture apparently uses &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;hybrid approaches&lt;/a&gt; that combine neural components with more structured, inspectable reasoning modules. While specifics remain limited—the company is still in controlled access—early descriptions suggest something closer to neurosymbolic systems than pure transformer scaling.&lt;/p&gt;

&lt;p&gt;This represents an emerging trend toward &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;architectures balancing capability with transparency&lt;/a&gt;. The massive foundation model players are unlikely to pivot away from scale, but a market segment is developing for interpretable alternatives in regulated industries. Healthcare, finance, and government applications where audit requirements are non-negotiable may find Guide Labs' approach compelling regardless of raw benchmark performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The shift to multi-agent systems has officially moved from experimental to expected. UiPath's 2026 report declares that &lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;"solo agents are out"&lt;/a&gt;—a stark signal that enterprise automation is embracing coordination complexity as the default approach rather than an advanced option.&lt;/p&gt;

&lt;p&gt;New coordination patterns are crystallizing around practical problems. &lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;Task graphs, shared vs. isolated context management, and merge strategies&lt;/a&gt; for handling simultaneous agent commits are becoming standard architectural considerations. The parallel to distributed systems design is intentional and useful: many patterns from microservices and distributed databases translate surprisingly well to multi-agent orchestration.&lt;/p&gt;

&lt;p&gt;The framework landscape is consolidating around developer experience. &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;PydanticAI offers a FastAPI-style approach&lt;/a&gt; that will feel immediately familiar to Python developers—type hints, dependency injection, and minimal boilerplate. &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Modus takes a different path&lt;/a&gt; with serverless WebAssembly agents that promise minimal cold starts, targeting use cases where latency sensitivity outweighs raw capability.&lt;/p&gt;

&lt;p&gt;The academic community is formalizing best practices. The &lt;a href="https://arxiv.org/html/2511.17332" rel="noopener noreferrer"&gt;AAAI 2026 Bridge Program&lt;/a&gt; highlighted the need for mechanism design principles in multi-agent systems—specifically around modeling preferences, incentives, and interaction rules. This matters because agents that work perfectly in isolation can produce adversarial or degenerate behavior when combined without careful incentive alignment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;Durable agent jobs&lt;/a&gt; enabling long-running workflows with state persistence across sessions are addressing one of the thorniest practical challenges. And &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;Open-AutoGLM&lt;/a&gt; has emerged as a credible open-source option for mobile device automation, reducing dependency on proprietary mobile agent frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Education Sector Embraces Multi-Agent AI Architecture
&lt;/h2&gt;

&lt;p&gt;The education sector is providing an interesting case study in multi-agent deployment at scale. The &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;Agentic Unified Student Support System (AUSS)&lt;/a&gt; demonstrates what happens when you apply multi-agent architecture to a traditionally fragmented problem space.&lt;/p&gt;

&lt;p&gt;AUSS integrates three tiers of specialized agents: student-level for personalized support, educator-level for teaching assistance, and institutional-level for administrative optimization. The reported metrics are impressive: &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;92.4% recommendation accuracy, 94.1% grading efficiency, and 89.5% F1-score&lt;/a&gt; on dropout prediction. These aren't cherry-picked benchmarks—dropout prediction in particular is a notoriously noisy classification problem.&lt;/p&gt;

&lt;p&gt;The technical stack is notably heterogeneous. The system &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;combines LLMs, reinforcement learning, predictive analytics, and rule-based reasoning&lt;/a&gt; rather than forcing everything through a single model architecture. This hybrid approach allows different agent types to use the most appropriate technique for their specific task while sharing information through unified interfaces.&lt;/p&gt;

&lt;p&gt;The design directly addresses what the researchers identify as &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;fragmentation in existing AI educational tools&lt;/a&gt;. Previous approaches treated tutoring, assessment, and administration as separate AI problems with separate systems. AUSS demonstrates that meaningful improvements come from agents that share context—a student's learning patterns inform grading feedback which influences dropout risk assessment in a continuous loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  DeepTest 2026 Competition Benchmarks LLM Safety in Automotive Systems
&lt;/h2&gt;

&lt;p&gt;As AI systems deploy in safety-critical domains, testing methodologies struggle to keep pace. The &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;DeepTest 2026 competition&lt;/a&gt; tackled this directly, challenging four teams to test in-car voice assistant safety using LLM-based test generators.&lt;/p&gt;

&lt;p&gt;The competing tools—&lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;ATLAS, Exida Test Generator, Warnless, and CRISP&lt;/a&gt;—represent different approaches to generating adversarial inputs for automotive AI testing. The goal isn't to break the systems for its own sake but to find failure modes before they occur in production with real drivers.&lt;/p&gt;

&lt;p&gt;The competition used &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;GPT-4o-Mini as an evaluation oracle&lt;/a&gt;, achieving an F1-score of 0.824 at a cost of $0.20 per 1000 requests. This pragmatic choice reflects the reality that human evaluation doesn't scale for automated testing pipelines, but current models can serve as reasonable proxies for detecting safety-relevant failures.&lt;/p&gt;

&lt;p&gt;The competition highlights a &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;growing focus on safety testing methodologies&lt;/a&gt; for deployed AI systems. Automotive is just one domain—similar challenges exist in healthcare, finance, and any application where AI errors have serious consequences. The tools developed here will likely influence testing approaches across industries as regulatory requirements for AI safety assurance mature.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The infrastructure constraints won't resolve quickly, so expect continued pressure on AI project timelines and costs through 2027. OpenAI's tiered access model for GPT-5.5 may become the template for capability governance industry-wide. And as multi-agent systems hit production, the failure modes will get interesting—watch for the first major incident involving emergent multi-agent behavior that nobody explicitly designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;AI disruption puts focus on long-term value of US equities, Goldman ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News | Latest Headlines and Developments | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/google-puts-ai-agents-heart-its-enterprise-money-making-push-2026-04-22/" rel="noopener noreferrer"&gt;Google puts AI agents at heart of its enterprise money-making push&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/" rel="noopener noreferrer"&gt;Intel soars on signs AI boom for CPUs is here - Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026 - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;Latest Agentic AI Trends to Watch in 2026: Market Shifts, Adoption ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2511.17332" rel="noopener noreferrer"&gt;AAAI 2026 Bridge Program on Advancing LLM-Based Multi-Agent ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;Agentic AI for Education: A Unified Multi-Agent Framework for ... - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends - Implementation Guide (Technical)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;DeepTest Tool Competition 2026: Benchmarking an LLM-Based ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;Guide Labs debuts a new kind of interpretable LLM | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Introducing GPT-5.5 - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;caramaschiHG/awesome-ai-agents-2026: The most comprehensive ...&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>Deep Agents: Building Long-Running Autonomous Agents with LangChain's New Framework</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:03:14 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/deep-agents-building-long-running-autonomous-agents-with-langchains-new-framework-1bpn</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/deep-agents-building-long-running-autonomous-agents-with-langchains-new-framework-1bpn</guid>
      <description>&lt;h1&gt;
  
  
  Deep Agents: Building Long-Running Autonomous Agents with LangChain's New Framework
&lt;/h1&gt;

&lt;p&gt;The era of single-turn, reactive agents is ending. As engineering teams push toward workflows that span hours instead of seconds—internal coding pipelines, multi-stage research synthesis, iterative design systems—the limitations of flat tool-calling architectures become painfully obvious. You cannot orchestrate a code review that spawns test generation, runs CI, waits for human approval, and then deploys without a fundamentally different approach to agent architecture. LangChain's Deep Agents framework, announced alongside &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;Deep Agents Deploy&lt;/a&gt; in March 2026, represents the clearest attempt yet to provide production-grade abstractions for this class of problem.&lt;/p&gt;

&lt;p&gt;This isn't just another wrapper. Deep Agents introduces a layered architecture where planning loops, persistent memory, and sub-agent delegation are first-class concerns—not afterthoughts bolted onto a ReAct loop. The framework sits deliberately above LangGraph (which gives you fine-grained state machine control) and the standard LangChain abstractions (which optimize for quick iteration). If you're building agents that need to survive restarts, coordinate specialized sub-agents, and maintain coherent long-term memory across sessions, this is the stack to understand.&lt;/p&gt;

&lt;p&gt;The use cases driving adoption are telling: &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;Open SWE&lt;/a&gt; for internal coding agents that autonomously fix bugs across repositories, GTM workflow automation that coordinates across CRM systems and communication channels, and design iteration systems like the Moda case study where agents loop through revision cycles with human feedback gates. In this deep-dive, we'll dissect the architecture, walk through the key APIs, examine deployment considerations, and build a working research-then-draft agent from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deep Agents Architecture: Planning, Memory, and Sub-Agents
&lt;/h2&gt;

&lt;p&gt;The Deep Agents framework rests on three architectural pillars that distinguish it from simpler agent implementations: planning loops that decompose goals into task graphs, persistent memory that survives across sessions and crashes, and sub-agent delegation that enables specialized capabilities to run in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  Planning Mechanism
&lt;/h3&gt;

&lt;p&gt;Traditional agents operate on flat tool-calling sequences—the model decides the next action based on the current state, executes it, and repeats. This works for simple tasks but falls apart when you need hierarchical goal decomposition. Deep Agents introduces a planning layer that generates task graphs before execution begins. The planning model analyzes the high-level objective, identifies dependencies between subtasks, and produces a directed acyclic graph (DAG) of work items. Each node in this graph can represent a direct tool call, a sub-agent invocation, or a human approval checkpoint.&lt;/p&gt;

&lt;p&gt;This approach parallels findings from the &lt;a href="https://arxiv.org/pdf/2602.07359" rel="noopener noreferrer"&gt;W&amp;amp;D paper on parallel tool calling&lt;/a&gt;, which demonstrated that research agents achieve significant speedups when tool calls without mutual dependencies execute concurrently. Deep Agents extends this to sub-agent coordination: a parent agent can spawn multiple child agents in a single planning step, wait for their results in parallel, and then synthesize findings before proceeding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Subsystem
&lt;/h3&gt;

&lt;p&gt;The memory architecture draws heavily from recent research on temporal reasoning in conversational AI. The &lt;a href="https://arxiv.org/html/2604.14362v1" rel="noopener noreferrer"&gt;APEX-MEM framework&lt;/a&gt; introduced property graphs with timestamp annotations for resolving when facts were learned and whether they remain valid. Deep Agents implements a similar approach: events are stored in append-only logs with temporal metadata, and the memory retrieval system can answer queries like "what did we learn about this repository before the last deployment?" rather than just "what do we know about this repository?"&lt;/p&gt;

&lt;p&gt;The MongoDB integration announced in the &lt;a href="https://blog.langchain.com/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust/" rel="noopener noreferrer"&gt;LangChain + MongoDB partnership&lt;/a&gt; provides durable checkpointing out of the box. Every state transition, planning decision, and sub-agent result gets persisted to MongoDB Atlas, enabling crash recovery and long-running workflows that span days.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sub-Agent Delegation
&lt;/h3&gt;

&lt;p&gt;Sub-agents aren't just function calls with extra steps—they're fully autonomous agents with their own planning loops, memory contexts, and tool access. The parent agent maintains a registry of available sub-agents with capability descriptions, and the planning model can decide to delegate tasks to the most appropriate specialist. This mirrors the "harness" concept from LangChain's architecture documentation: the separation between orchestration logic (what needs to happen and when) and compute execution (the actual work). The parent agent is the orchestrator; sub-agents are the compute layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's build a research-then-draft agent that demonstrates the core Deep Agents patterns. This agent accepts a topic, spawns specialized sub-agents for search and summarization, uses long-term memory to avoid redundant work, and produces a structured Markdown report.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
research_agent.py
A Deep Agent that coordinates search and summarization sub-agents
to produce research reports with persistent memory.

Requires:
- langchain-core &amp;gt;= 0.3.42
- langchain-deepagents &amp;gt;= 0.1.8
- langchain-mongodb &amp;gt;= 0.2.1
- langchain-community &amp;gt;= 0.3.20 (for TavilySearchResults)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_deepagents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SubAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PlanningConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_mongodb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoDBCheckpointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MongoDBMemory&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TavilySearchResults&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the planning model - Claude 3.5 Sonnet works well for planning tasks
&lt;/span&gt;&lt;span class="n"&gt;planning_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;  &lt;span class="c1"&gt;# Low temperature for more deterministic planning
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure MongoDB checkpointer for crash recovery
# This persists every state transition to Atlas
&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoDBCheckpointer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MONGODB_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;database_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_agent_checkpoints&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Enable TTL for automatic cleanup of old checkpoints
&lt;/span&gt;    &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;86400&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# 30 days retention
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Long-term memory backend with temporal resolution
# Stores facts with timestamps for "when did we learn this?" queries
&lt;/span&gt;&lt;span class="n"&gt;memory_backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoDBMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;connection_string&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MONGODB_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;database_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Enable vector search for semantic retrieval
&lt;/span&gt;    &lt;span class="n"&gt;vector_index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_vector_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define the search sub-agent's tool
&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TavilySearchResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;search_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advanced&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;include_raw_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_key_claims&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract key claims and their supporting evidence from content.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In production, this would call a specialized extraction model
&lt;/span&gt;    &lt;span class="c1"&gt;# Here we're showing the tool interface pattern
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;]}]&lt;/span&gt;

&lt;span class="c1"&gt;# Define the search sub-agent
&lt;/span&gt;&lt;span class="n"&gt;search_subagent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SubAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Searches the web for current information on a topic. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use this for gathering primary sources and recent developments.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;# Sub-agents can have their own planning constraints
&lt;/span&gt;    &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Memory scope: this sub-agent shares memory with parent
&lt;/span&gt;    &lt;span class="n"&gt;memory_scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shared&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define the summarization sub-agent
&lt;/span&gt;&lt;span class="n"&gt;summarization_subagent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SubAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarization_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Condenses and synthesizes information from multiple sources. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use this after gathering raw sources to produce coherent summaries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;extract_key_claims&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shared&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure planning behavior
&lt;/span&gt;&lt;span class="n"&gt;planning_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PlanningConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Maximum depth of planning recursion
&lt;/span&gt;    &lt;span class="n"&gt;max_planning_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Timeout for individual sub-agent invocations (seconds)
&lt;/span&gt;    &lt;span class="n"&gt;sub_agent_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Enable parallel sub-agent execution when dependencies allow
&lt;/span&gt;    &lt;span class="n"&gt;parallel_execution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Replan if a sub-agent fails rather than failing the whole workflow
&lt;/span&gt;    &lt;span class="n"&gt;replan_on_failure&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Deep Agent
&lt;/span&gt;&lt;span class="n"&gt;research_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DeepAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_report_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Produces comprehensive research reports by coordinating &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search and summarization specialists.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;planning_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;planning_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_backend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_subagent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summarization_subagent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;planning_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;planning_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Path to custom instructions file
&lt;/span&gt;    &lt;span class="n"&gt;instructions_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./AGENTS.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example invocation
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generate a research report on the given topic.
    Uses session_id for memory continuity across invocations.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;objective&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the topic &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; and produce a &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;structured Markdown report with citations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="c1"&gt;# Enable LangSmith tracing for observability
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;  &lt;span class="c1"&gt;# LangSmith callbacks auto-inject when configured
&lt;/span&gt;            &lt;span class="c1"&gt;# Memory retrieval configuration
&lt;/span&gt;            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_before_planning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_memory_items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="c1"&gt;# Skip sources we've already processed in this session
&lt;/span&gt;                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dedupe_by&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recent advances in parallel tool calling for AI agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;AGENTS.md&lt;/code&gt; file referenced above provides custom instructions that guide agent behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Research Report Agent Instructions&lt;/span&gt;

&lt;span class="gu"&gt;## Planning Guidelines&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Always check memory for previously researched sources before searching
&lt;span class="p"&gt;-&lt;/span&gt; Spawn search_agent first to gather sources, then summarization_agent to synthesize
&lt;span class="p"&gt;-&lt;/span&gt; If fewer than 3 high-quality sources found, replan with broader search terms

&lt;span class="gu"&gt;## Output Requirements&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Include inline citations linking to source URLs
&lt;span class="p"&gt;-&lt;/span&gt; Structure reports with: Executive Summary, Key Findings, Detailed Analysis, Sources
&lt;span class="p"&gt;-&lt;/span&gt; Flag any conflicting information between sources

&lt;span class="gu"&gt;## Safety Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not include speculation presented as fact
&lt;span class="p"&gt;-&lt;/span&gt; Mark any information older than 6 months as potentially outdated
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To view execution traces and debug planning decisions, access the LangSmith dashboard where each planning step, sub-agent invocation, and memory operation appears as a nested span with latency and token cost attribution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Patterns: Deep Agents Deploy and Sandbox Execution
&lt;/h2&gt;

&lt;p&gt;Running long-running agents in production requires infrastructure that most teams don't want to build themselves. Deep Agents Deploy, &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;announced as "an open alternative to Claude Managed Agents"&lt;/a&gt;, provides a hosted runtime specifically designed for agents that execute over minutes or hours rather than seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure Abstraction
&lt;/h3&gt;

&lt;p&gt;Deep Agents Deploy handles the unglamorous but critical concerns: process isolation, crash recovery, timeout management, and resource allocation. When your agent spawns ten sub-agents in parallel for a research task, the runtime distributes these across worker pools and manages backpressure. When a network partition causes a checkpoint failure, the runtime automatically retries from the last durable state.&lt;/p&gt;

&lt;p&gt;This separation mirrors the architecture pattern OpenAI introduced in their &lt;a href="https://openai.com/index/the-next-evolution-of-the-agents-sdk/" rel="noopener noreferrer"&gt;Agents SDK evolution&lt;/a&gt;: the harness (orchestration logic) runs on managed infrastructure while compute (the actual LLM calls and tool executions) can be distributed across different environments. Deep Agents Deploy implements this pattern with first-class support for LangGraph-based sub-agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Acceleration
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;NVIDIA enterprise integration&lt;/a&gt; enables hardware acceleration for compute-intensive sub-agents. When a Deep Agent spawns multiple sub-agents that each need to process large documents or run inference on specialized models, the &lt;code&gt;langchain-nvidia&lt;/code&gt; package routes these to NIM microservices running on GPU infrastructure. This becomes significant when your planning DAG has wide parallelism—ten sub-agents each running a 70B parameter model benefit substantially from NIM's optimized batch inference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Durability and Cost
&lt;/h3&gt;

&lt;p&gt;MongoDB-backed checkpoints provide durability guarantees that in-memory state cannot. If your agent is three hours into a complex workflow and the host process dies, resumption picks up from the last checkpoint rather than starting over. The &lt;a href="https://blog.langchain.com/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust/" rel="noopener noreferrer"&gt;MongoDB partnership&lt;/a&gt; specifically targets this use case: vector search for memory retrieval and document checkpointing unified in a single database you're likely already running.&lt;/p&gt;

&lt;p&gt;Cost attribution in long-running agents requires tracking spend across sub-agent invocations. LangSmith Fleet provides identity-based cost tracking where each Deep Agent instance accumulates costs from all its sub-agent invocations, enabling accurate per-workflow billing and optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Considerations
&lt;/h3&gt;

&lt;p&gt;For agents that execute code—and most production agents eventually do—sandboxing is non-negotiable. LangSmith Sandboxes provide isolated execution environments for code-generation sub-agents, preventing arbitrary code execution from compromising your infrastructure. For runtime policy enforcement on tool calls, the &lt;a href="https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/" rel="noopener noreferrer"&gt;Agent Governance Toolkit&lt;/a&gt; integrates with Deep Agents to enforce constraints like "this agent cannot call external APIs after 6 PM" or "this agent cannot modify files outside the /workspace directory."&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation and Observability for Long-Running Agents
&lt;/h2&gt;

&lt;p&gt;Traditional LLM evaluation—comparing model outputs to gold-standard responses—breaks down for autonomous agents. When an agent takes 47 steps to complete a task, the final output might be correct even if step 23 was wildly inefficient. Conversely, an incorrect final output might result from a single bad decision at step 12 that cascaded through the rest of the workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Observability Gap
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering survey&lt;/a&gt; found that 89% of production teams use observability for their agents, but evaluation remains the biggest blocker to deployment confidence. Teams can see what their agents are doing but struggle to systematically assess whether the agents are doing it well.&lt;/p&gt;

&lt;p&gt;LangSmith addresses this with trace-based evaluation: each planning step, sub-agent result, and memory operation gets captured as a span that can be individually scored. You define evaluators that examine not just the final output but the quality of intermediate decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith.evaluation&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunEvaluator&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PlanningQualityEvaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RunEvaluator&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluates whether the agent&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s planning was efficient.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;example&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Extract planning steps from the trace
&lt;/span&gt;        &lt;span class="n"&gt;planning_spans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;child_runs&lt;/span&gt; 
                         &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planning_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

        &lt;span class="c1"&gt;# Score based on planning efficiency metrics
&lt;/span&gt;        &lt;span class="n"&gt;replan_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;planning_spans&lt;/span&gt; 
                          &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;replan&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Penalize excessive replanning
&lt;/span&gt;        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;replan_count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planning_efficiency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Skills as Evaluated Capabilities
&lt;/h3&gt;

&lt;p&gt;Deep Agents introduces "Skills"—reusable, versioned capabilities that agents can attach. A "web_research" skill encapsulates the ability to search, filter, and synthesize web content. Each skill has associated evaluators that run during CI to ensure capability regressions don't ship to production. When you upgrade from &lt;code&gt;web_research@1.2&lt;/code&gt; to &lt;code&gt;web_research@1.3&lt;/code&gt;, the skill's evaluation suite must pass before deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Harness Optimization
&lt;/h3&gt;

&lt;p&gt;The hill-climbing approach to harness configuration uses evaluation feedback loops to optimize agent behavior. You define success metrics (task completion rate, average step count, cost per task), run the agent against a benchmark suite, and systematically adjust harness parameters—&lt;code&gt;max_planning_depth&lt;/code&gt;, &lt;code&gt;sub_agent_timeout&lt;/code&gt;, memory retrieval limits—to improve metrics. This transforms agent tuning from intuition-driven to data-driven.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;When to choose Deep Agents over LangGraph&lt;/strong&gt;: Deep Agents adds value when your workflow requires persistent memory across sessions, autonomous goal decomposition, or sub-agent coordination. If you're building a customer support bot that handles single queries, LangGraph's lower-level abstractions are simpler and faster to iterate on. If you're building an internal coding agent that maintains context across PRs, learns from past reviews, and coordinates linter, test-runner, and documentation sub-agents, Deep Agents provides the right level of abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory backend selection&lt;/strong&gt;: The &lt;a href="https://blog.langchain.com/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust/" rel="noopener noreferrer"&gt;MongoDB Checkpointer&lt;/a&gt; makes sense for teams already running Atlas—you get vector search for memory retrieval and checkpointing in a single managed database. For teams with existing PostgreSQL infrastructure, PostgresSaver provides equivalent durability with pgvector for semantic retrieval. The temporal reasoning features from the &lt;a href="https://arxiv.org/html/2604.14362v1" rel="noopener noreferrer"&gt;APEX-MEM research&lt;/a&gt; are currently better supported in the MongoDB backend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migration path&lt;/strong&gt;: Existing LangGraph agents can be wrapped as sub-agents within a Deep Agent orchestrator. This enables incremental migration: start by wrapping your most complex workflow as a Deep Agent while keeping simpler agents on LangGraph. The shared memory scope ensures the orchestrator and sub-agents maintain consistent context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost and latency tradeoffs&lt;/strong&gt;: Long-running agents accumulate token costs across planning iterations and sub-agent invocations. Set &lt;code&gt;max_planning_depth&lt;/code&gt; limits based on your cost tolerance. Monitor &lt;code&gt;speculative_waste_ratio&lt;/code&gt; in LangSmith—this metric shows how often the planning model generates plans that get abandoned due to execution failures. A high ratio indicates either overly ambitious planning or unreliable tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team readiness&lt;/strong&gt;: Deep Agents requires observability maturity. If your team isn't already using LangSmith tracing for existing LLM workflows, start there before deploying autonomous agents. The &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering&lt;/a&gt; findings make this clear: teams without observability infrastructure struggle to debug and optimize agent behavior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security posture&lt;/strong&gt;: Enable Sandboxes for any code-execution sub-agents—this is not optional for production deployments. Integrate with the &lt;a href="https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/" rel="noopener noreferrer"&gt;Agent Governance Toolkit&lt;/a&gt; for runtime policy enforcement. Define explicit constraints on what tools each sub-agent can call and under what conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;Build a &lt;strong&gt;code review coordination agent&lt;/strong&gt; that demonstrates the full Deep Agents architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parent Agent&lt;/strong&gt;: Accepts a PR URL, plans the review workflow, coordinates sub-agents, maintains memory of past reviews on this repository&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static Analysis Sub-Agent&lt;/strong&gt;: Runs linters and type checkers, reports findings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Review Sub-Agent&lt;/strong&gt;: Scans for common vulnerabilities, checks dependency updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Coverage Sub-Agent&lt;/strong&gt;: Identifies untested code paths, suggests test cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation Sub-Agent&lt;/strong&gt;: Checks for missing docstrings, outdated README sections&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Configure MongoDB checkpointing so the review survives interruptions. Use the &lt;code&gt;AGENTS.md&lt;/code&gt; file to encode your team's code review standards. Set up LangSmith tracing and build a custom evaluator that scores review thoroughness against a benchmark of manually-reviewed PRs.&lt;/p&gt;

&lt;p&gt;The goal isn't a production-ready tool—it's hands-on experience with planning DAGs, sub-agent coordination, persistent memory, and trace-based evaluation. These are the primitives that every non-trivial agent will require as we move from demos to deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;March 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;LangChain Announces Enterprise Agentic AI Platform Built with NVIDIA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust/" rel="noopener noreferrer"&gt;Announcing the LangChain + MongoDB Partnership: The AI Agent Stack That Runs On The Database You Already Trust&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/the-next-evolution-of-the-agents-sdk/" rel="noopener noreferrer"&gt;The next evolution of the Agents SDK - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2602.07359" rel="noopener noreferrer"&gt;W&amp;amp;D: Scaling Parallel Tool Calling for Efficient Deep Research Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.14362v1" rel="noopener noreferrer"&gt;APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://opensource.microsoft.com/blog/2026/04/02/introducing-the-agent-governance-toolkit-open-source-runtime-security-for-ai-agents/" rel="noopener noreferrer"&gt;Introducing the Agent Governance Toolkit - Microsoft Open Source&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly: Agent Wars Escalate as Anthropic Reclaims Benchmark Crown and Infrastructure Reality Bites</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:02:37 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-agent-wars-escalate-as-anthropic-reclaims-benchmark-crown-and-infrastructure-reality-2fec</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-agent-wars-escalate-as-anthropic-reclaims-benchmark-crown-and-infrastructure-reality-2fec</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly: Agent Wars Escalate as Anthropic Reclaims Benchmark Crown and Infrastructure Reality Bites
&lt;/h1&gt;

&lt;p&gt;The battle for AI supremacy entered a new phase this week as Anthropic's Claude Opus 4.7 narrowly reclaimed the top spot on agentic coding benchmarks, while OpenAI responded by expanding Codex's desktop automation capabilities in a direct challenge to Anthropic's computer use features. Meanwhile, a sobering Reuters analysis put hard numbers on the gap between AI ambitions and physical reality—a $7 trillion gap, to be precise.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Beefs Up Codex With Desktop Control, Taking Direct Aim at Anthropic
&lt;/h2&gt;

&lt;p&gt;OpenAI announced significant enhancements to its Codex agent this week, rolling out expanded desktop automation capabilities that position the tool as a direct competitor to Anthropic's computer use features. The update, detailed in OpenAI's &lt;a href="https://openai.com/index/next-phase-of-enterprise-ai/" rel="noopener noreferrer"&gt;enterprise AI roadmap&lt;/a&gt;, gives Codex substantially more power over user desktop environments, including the ability to navigate file systems, manipulate application windows, and execute multi-step workflows across different programs.&lt;/p&gt;

&lt;p&gt;The timing is no coincidence. Anthropic's computer use capabilities, introduced with Claude 3.5 Sonnet in late 2024, established an early lead in the "agents that control your screen" category. OpenAI's enhanced Codex represents a calculated move to close that gap before enterprise adoption patterns solidify. According to &lt;a href="https://venturebeat.com/" rel="noopener noreferrer"&gt;VentureBeat's coverage&lt;/a&gt; of the announcement, the new features include improved error recovery when desktop automation encounters unexpected UI states—a critical pain point in earlier agentic systems.&lt;/p&gt;

&lt;p&gt;Industry observers note this escalation reflects broader "agent wars" dynamics between major AI labs. As models converge on similar benchmark performance, the competitive differentiation increasingly comes from how effectively these systems can operate autonomously in real-world computing environments. OpenAI appears to be betting that enterprise customers will favor tighter integration with existing productivity workflows over raw model capability alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic CPO Departs Figma Board Amid Reports of Competing Product Launch
&lt;/h2&gt;

&lt;p&gt;In a move that sent ripples through both AI and design tool communities, Anthropic's Chief Product Officer departed from Figma's board of directors this week. &lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;TechCrunch reported&lt;/a&gt; that the departure stemmed from plans to launch a product that would compete directly with Figma's collaborative design platform.&lt;/p&gt;

&lt;p&gt;The resignation highlights an increasingly awkward tension: AI companies that once positioned themselves as infrastructure providers are now eyeing the application layer—including tools built by their own board affiliates. For Figma, which has spent years building a collaborative design ecosystem, the prospect of an AI-native competitor backed by one of the leading foundation model companies represents an existential-level threat.&lt;/p&gt;

&lt;p&gt;The departure raises broader questions about AI companies' expansion strategies. Anthropic has historically emphasized its role as a model provider and safety research organization, but the design tool space represents lucrative territory where AI capabilities could fundamentally reshape workflows. Single-prompt generation of complex design assets, AI-driven prototyping, and intelligent design system management are all areas where foundation model capabilities could disrupt incumbent tools.&lt;/p&gt;

&lt;p&gt;Industry analysts suggest this move may accelerate consolidation in the design tool market, as traditional players race to integrate AI capabilities before AI-native alternatives mature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The agentic AI landscape continues its rapid evolution, with new data underscoring both the opportunity and organizational challenges ahead. According to UiPath's &lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;2026 report&lt;/a&gt;, 78% of executives say they need to reinvent their operating models to capture agentic AI value—a stark admission that current organizational structures weren't designed for autonomous AI systems.&lt;/p&gt;

&lt;p&gt;Architecture patterns are maturing quickly. The era of solo agents handling tasks in isolation is giving way to &lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;multi-agent systems&lt;/a&gt; featuring centralized control layers that coordinate specialized agents, handle inter-agent communication, and maintain coherent state across complex workflows. This "orchestration layer" approach mirrors microservices patterns from traditional software architecture.&lt;/p&gt;

&lt;p&gt;Quality control practices are adapting accordingly. Agentic QA—where AI agents review AI-generated code for security vulnerabilities, architectural consistency, and adherence to coding standards—is becoming &lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;standard practice&lt;/a&gt; in mature development organizations. The irony of AI checking AI isn't lost on practitioners, but the approach has proven more scalable than human review alone.&lt;/p&gt;

&lt;p&gt;For those tracking the space, the &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;awesome-ai-agents-2026&lt;/a&gt; GitHub repository now catalogs over 340 resources across 20 categories, updated monthly. Key framework categories that have emerged include IDE-native agents, terminal/CLI agents, autonomous software engineers, and multi-agent orchestration platforms—reflecting the field's rapid specialization.&lt;/p&gt;

&lt;h2&gt;
  
  
  InsightFinder Raises $15M to Debug Where AI Agents Go Wrong
&lt;/h2&gt;

&lt;p&gt;InsightFinder announced a &lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;$15 million funding round&lt;/a&gt; this week, targeting the growing observability gap as enterprises deploy increasingly autonomous AI agents. The company's platform focuses specifically on helping organizations diagnose why and where AI agent workflows fail—a problem that grows more critical as agents handle longer, more complex task chains.&lt;/p&gt;

&lt;p&gt;The funding reflects a maturing market reality: deploying AI agents is relatively straightforward; understanding what they're doing and why they fail is considerably harder. Traditional application performance monitoring tools weren't designed for systems where the "application logic" is an emergent property of model behavior rather than deterministic code paths.&lt;/p&gt;

&lt;p&gt;InsightFinder's approach involves capturing detailed traces of agent reasoning, tool calls, and environmental interactions, then using specialized models to identify failure patterns and root causes. Enterprise customers report that debugging time for agent failures has dropped by 60-70% compared to manual log analysis.&lt;/p&gt;

&lt;p&gt;The round signals broader investor confidence in the AI operations tooling layer. As &lt;a href="https://venturebeat.com/" rel="noopener noreferrer"&gt;VentureBeat noted&lt;/a&gt;, the "AI observability" category has attracted over $200 million in venture funding in 2026 alone, suggesting that operational maturity—not just raw capability—is becoming the key enterprise adoption bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Opus 4.7 Reclaims Top Spot With 64.3% SWE-bench Pro Score
&lt;/h2&gt;

&lt;p&gt;Anthropic &lt;a href="https://venturebeat.com/technology/anthropic-releases-claude-opus-4-7-narrowly-retaking-lead-for-most-powerful-generally-available-llm" rel="noopener noreferrer"&gt;released Claude Opus 4.7&lt;/a&gt; this week, narrowly reclaiming the crown for most powerful generally available large language model. The new release achieved 64.3% on SWE-bench Pro agentic coding tasks—a significant jump from the 53.4% score posted by its predecessor and enough to edge past both GPT-5.4 and &lt;a href="https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again/" rel="noopener noreferrer"&gt;Gemini 3.1 Pro&lt;/a&gt; on key benchmarks.&lt;/p&gt;

&lt;p&gt;The visual processing improvements are equally striking. Performance on XBOW tests—which measure fine-grained visual understanding—jumped from 54.5% to 98.5%, suggesting Anthropic has made substantial progress on multimodal capabilities that translate directly to computer use and GUI automation tasks.&lt;/p&gt;

&lt;p&gt;Notably absent from the public release: Mythos, Anthropic's more powerful successor model that reportedly surpasses Opus 4.7 by a significant margin. According to the &lt;a href="https://venturebeat.com/technology/anthropic-releases-claude-opus-4-7-narrowly-retaking-lead-for-most-powerful-generally-available-llm" rel="noopener noreferrer"&gt;VentureBeat report&lt;/a&gt;, Mythos remains restricted to enterprise security partners, suggesting Anthropic is taking a staged approach to releasing its most capable systems.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/salttechno/LLM-Model-Comparison-2026" rel="noopener noreferrer"&gt;LLM comparison data&lt;/a&gt; tracking these releases shows benchmark competition has intensified dramatically—margins between leading models are now measured in single percentage points rather than the double-digit gaps common in 2024.&lt;/p&gt;

&lt;h2&gt;
  
  
  $7 Trillion Reality Check: AI Infrastructure Dreams Hit Power Wall
&lt;/h2&gt;

&lt;p&gt;A sobering &lt;a href="https://www.reuters.com/commentary/breakingviews/ai-dreams-crash-into-stark-7-trln-reality-2026-04-07/" rel="noopener noreferrer"&gt;Reuters analysis&lt;/a&gt; published this week put hard numbers on the gap between AI ambitions and physical reality. Meta, xAI, and other major players are collectively targeting data centers consuming 110 gigawatts—roughly equivalent to Japan's entire electricity consumption. The price tag for this infrastructure build-out: approximately $7 trillion.&lt;/p&gt;

&lt;p&gt;NVIDIA CEO Jensen Huang's recent estimate that each major AI data center requires a minimum $60 billion investment underscores the scale challenge. These aren't software problems solvable with clever engineering; they're physical infrastructure constraints that require years of permitting, construction, and grid upgrades to address.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;Reuters analysis&lt;/a&gt; questions whether current AI business models can sustain these capital requirements. Revenue from AI services would need to grow by orders of magnitude to justify infrastructure investments at this scale—a bet that assumes continued exponential capability improvements translating into proportional commercial value.&lt;/p&gt;

&lt;p&gt;Power availability has emerged as the critical bottleneck. Even companies with unlimited capital face multi-year timelines to secure sufficient electricity, leading to creative solutions like locating facilities near nuclear plants or acquiring entire utility companies. The irony isn't lost on observers: the most advanced AI systems humanity has built are ultimately constrained by our ability to move electrons.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google AI Mode Now Enables Side-by-Side Web Exploration
&lt;/h2&gt;

&lt;p&gt;Google rolled out a significant update to its AI Mode this week, introducing side-by-side web exploration that lets users browse source materials while interacting with AI-generated responses. The feature, announced alongside the global rollout of &lt;a href="https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again/" rel="noopener noreferrer"&gt;Gemini 3.1 Flash Live&lt;/a&gt;, addresses persistent user demand for source verification in AI-assisted search.&lt;/p&gt;

&lt;p&gt;The implementation reflects Google's attempt to thread a difficult needle: providing the convenience of AI-synthesized answers while preserving the web's role as a navigable information ecosystem. Users can now click on any cited source within an AI Mode response to open it in a split-screen view, examining the original context without losing their AI conversation thread.&lt;/p&gt;

&lt;p&gt;For publishers who have worried about AI reducing web traffic, the feature offers a partial olive branch—though whether it meaningfully increases click-through rates remains to be seen. Early data from Google suggests users do engage with source materials roughly 40% of the time when the side-by-side option is available.&lt;/p&gt;

&lt;p&gt;The broader trend points toward hybrid AI-augmented browsing becoming the default web experience. Rather than choosing between traditional search and AI chat interfaces, users increasingly expect both simultaneously—a pattern that &lt;a href="https://venturebeat.com/" rel="noopener noreferrer"&gt;VentureBeat suggests&lt;/a&gt; will reshape how websites are designed and optimized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adobe Firefly Assistant Aims to Unify Creative Suite With Single-Prompt Control
&lt;/h2&gt;

&lt;p&gt;Adobe launched its Firefly AI Assistant this week, a cross-application tool that spans Photoshop, Premiere Pro, Illustrator, and other Creative Cloud applications. The headline capability: a single natural language prompt can orchestrate actions across multiple apps simultaneously—edit an image in Photoshop, apply matching color grading in Premiere, and generate complementary vector assets in Illustrator, all from one command.&lt;/p&gt;

&lt;p&gt;The integration represents Adobe's most aggressive move yet to defend its creative tool dominance against AI-native competitors. As &lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;TechCrunch coverage&lt;/a&gt; notes, startups have been chipping away at individual Creative Cloud use cases with AI-first approaches; Adobe's response is to leverage its multi-application ecosystem as a moat that single-purpose AI tools can't easily cross.&lt;/p&gt;

&lt;p&gt;Early user feedback suggests the workflow consolidation delivers genuine productivity gains for complex projects, though the learning curve for effective prompting remains steep. Power users report that crafting prompts that reliably produce desired cross-application results requires substantial experimentation.&lt;/p&gt;

&lt;p&gt;The assistant also introduces persistent project context—it remembers style decisions, brand guidelines, and user preferences across sessions and applications. For creative teams working on large-scale projects, this contextual memory could prove more valuable than the generation capabilities themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The infrastructure reality check from Reuters deserves close attention in coming months—if power constraints force AI labs to slow capability scaling, the entire competitive dynamic could shift toward efficiency optimization rather than raw capability races. Meanwhile, the narrowing benchmark gaps between Claude, GPT, and Gemini suggest we're approaching a regime where model differentiation comes from tooling, ecosystem integration, and deployment flexibility rather than benchmark scores. Expect the agent wars to intensify as that new competitive landscape takes shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News | Latest Headlines and Developments | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/" rel="noopener noreferrer"&gt;VentureBeat | Transformative tech coverage that matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/next-phase-of-enterprise-ai/" rel="noopener noreferrer"&gt;The next phase of enterprise AI | OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/commentary/breakingviews/ai-dreams-crash-into-stark-7-trln-reality-2026-04-07/" rel="noopener noreferrer"&gt;AI dreams crash into stark $7 trln reality | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;Agentifying Agentic AI - AAAI 2026 Bridge Program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;Latest Agentic AI Trends to Watch in 2026 | Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;caramaschiHG/awesome-ai-agents-2026 | GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends - Implementation Guide | Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again/" rel="noopener noreferrer"&gt;Google's new Gemini Pro model has record benchmark scores | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/salttechno/LLM-Model-Comparison-2026" rel="noopener noreferrer"&gt;salttechno/LLM-Model-Comparison-2026 | GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://venturebeat.com/technology/anthropic-releases-claude-opus-4-7-narrowly-retaking-lead-for-most-powerful-generally-available-llm" rel="noopener noreferrer"&gt;Anthropic releases Claude Opus 4.7 | VentureBeat&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:05:36 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/langsmith-fleet-managing-agent-identity-permissions-and-skills-at-enterprise-scale-19p7</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/langsmith-fleet-managing-agent-identity-permissions-and-skills-at-enterprise-scale-19p7</guid>
      <description>&lt;h1&gt;
  
  
  LangSmith Fleet: Managing Agent Identity, Permissions, and Skills at Enterprise Scale
&lt;/h1&gt;

&lt;p&gt;The LangChain team quietly shipped one of the most significant architectural changes to LangSmith in March 2026, and it wasn't a new model integration or a flashy UI overhaul. They renamed Agent Builder to &lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;Fleet&lt;/a&gt;—a seemingly cosmetic change that signals a fundamental shift in how enterprises should think about their AI agent portfolios. This isn't about building better individual agents anymore; it's about managing dozens or hundreds of agents as a coordinated organizational capability. If you're running more than a handful of agents in production, this is the infrastructure layer you didn't know you needed.&lt;/p&gt;

&lt;p&gt;The timing is deliberate. LangChain's &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering report&lt;/a&gt; revealed a troubling gap: 89% of organizations have observability in place, but most lack centralized governance for their agent portfolios. Teams are spinning up agents in silos, duplicating prompt engineering effort, and losing track of which agents have access to what resources. Fleet directly addresses this governance vacuum by introducing three primitives that enterprise deployments desperately need: agent identity, role-based permissions, and reusable Skills.&lt;/p&gt;

&lt;p&gt;With &lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;300+ enterprise customers now processing over 15 billion traces&lt;/a&gt;, LangSmith has accumulated hard-won lessons about what breaks at scale. Fleet codifies these patterns into infrastructure that prevents the debugging nightmare of "which agent caused this production incident?" before it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fleet Architecture: Identity and Permissions Model
&lt;/h2&gt;

&lt;p&gt;The core insight behind Fleet's design is deceptively simple: every agent in your organization needs a stable, auditable identity that persists across deployments, versions, and team ownership changes. This sounds obvious until you realize how most teams currently manage agents—as anonymous graph definitions deployed via CI/CD with no central registry.&lt;/p&gt;

&lt;p&gt;Fleet's identity system assigns each agent a unique organizational identifier that ties together its definition, deployment history, performance metrics, and access patterns. This identity travels with the agent across environments. When your compliance checking agent runs in staging versus production, it's the same identity with environment-specific configuration, not two unrelated deployments you have to mentally correlate.&lt;/p&gt;

&lt;p&gt;The permission model layers role-based access control on top of these identities. Fleet distinguishes between several permission levels: viewers can observe agent behavior and metrics, editors can modify prompts and tool configurations, deployers can push agents to production environments, and administrators can retire agents or transfer ownership. These map cleanly to organizational realities—your ML platform team shouldn't need to ask a Slack channel before deploying a new agent version, but they probably shouldn't be editing the legal team's contract review prompts without approval.&lt;/p&gt;

&lt;p&gt;Sharing mechanisms enable cross-team agent reuse without the current anti-pattern of exporting agent definitions as JSON and importing them into another workspace. When a shared agent is updated, teams consuming it can see the update and choose to adopt it—version control semantics applied to agent definitions. This preserves provenance: you always know which team owns the canonical definition and what modifications downstream teams have made.&lt;/p&gt;

&lt;p&gt;Integration with enterprise identity providers happens at the organization level. Fleet supports SSO and SAML authentication, which means your existing Okta groups or Azure AD roles can map directly to Fleet permissions. The compliance team's AD group gets viewer access to all agents; the ML platform team's group gets deployer access. No separate permission system to maintain.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;audit trail functionality&lt;/a&gt; captures agent lifecycle events comprehensively: creation timestamp and creator identity, every modification with diff visibility, deployment events across environments, permission grants and revocations. This isn't just for compliance checkbox exercises—it's the forensic evidence you need when a production agent starts behaving unexpectedly six weeks after someone made a "minor prompt tweak."&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills: Equipping Agents with Reusable Capabilities
&lt;/h2&gt;

&lt;p&gt;If identity and permissions form Fleet's governance layer, Skills represent its knowledge management layer. A Skill is a modular, shareable knowledge package that can be attached to any agent in your fleet. The &lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;March 2026 newsletter announced the first open-source Skills&lt;/a&gt; alongside the Fleet rename, establishing an ecosystem pattern that will likely expand significantly.&lt;/p&gt;

&lt;p&gt;The architectural insight here addresses a real pain point: domain expertise shouldn't be trapped inside individual agent definitions. When your engineering team figures out the optimal way to interact with your internal APIs—the authentication dance, rate limiting patterns, error recovery logic—that knowledge currently lives in one agent's prompts. Every subsequent agent that needs API access must rediscover or copy-paste this expertise.&lt;/p&gt;

&lt;p&gt;Skills separate domain expertise from agent orchestration logic. A "Company API Integration" Skill encapsulates authentication patterns, retry strategies, and response parsing conventions. A "Legal Document Formatting" Skill knows your organization's citation styles, confidentiality markings, and section numbering conventions. These Skills attach to agents without modifying the agent's core orchestration graph.&lt;/p&gt;

&lt;p&gt;The skill attachment model supports both static and dynamic binding. Static attachment happens at agent definition time—you declare that your contract review agent always uses the Legal Document Formatting Skill. Dynamic attachment allows agents to discover and request Skills at runtime based on task requirements, though this requires more sophisticated capability negotiation that most teams won't need initially.&lt;/p&gt;

&lt;p&gt;Version management for Skills solves the knowledge distribution problem. When your platform team improves the API Integration Skill—say, adding support for a new authentication method—that improvement propagates to all agents using the Skill. Teams can pin to specific Skill versions for stability or track the latest version for continuous improvement. The semantics mirror dependency management in traditional software development, which makes the mental model accessible to engineering teams.&lt;/p&gt;

&lt;p&gt;The skill definition format establishes a standard structure that includes capability declarations (what the Skill enables), knowledge content (prompts, examples, patterns), tool bindings (if the Skill requires specific tool access), and configuration parameters (organization-specific customization points). This standardization means Skills can be shared not just within organizations but eventually across the emerging ecosystem of &lt;a href="https://arxiv.org/html/2604.05387v1" rel="noopener noreferrer"&gt;function calling improvements&lt;/a&gt; that the broader community is developing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's build a practical Fleet setup with multiple agents, custom Skills, and proper permission configuration. This example creates a document processing fleet for a legal department with shared compliance capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# fleet_setup.py
# Requires: langsmith&amp;gt;=0.3.0, langchain&amp;gt;=0.4.0, langgraph&amp;gt;=0.3.0
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith.fleet&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;Fleet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;AgentIdentity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;PermissionSet&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SkillAttachment&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the LangSmith client with Fleet capabilities
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;fleet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Fleet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;organization_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-org-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define agent identity with comprehensive metadata
# This identity persists across deployments and environment changes
&lt;/span&gt;&lt;span class="n"&gt;contract_reviewer_identity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentIdentity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contract-reviewer-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contract Review Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;owner_team&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal-ops&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reviews vendor contracts for compliance issues and risk factors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deployment_environments&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendor-management&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# Metadata for organizational classification
&lt;/span&gt;    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_center&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LEGAL-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_classification&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidential&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review_frequency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quarterly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register the identity with Fleet - this creates the audit trail
&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contract_reviewer_identity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent registered with ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define a reusable Skill for compliance checking
# Skills encapsulate domain expertise separate from orchestration
&lt;/span&gt;&lt;span class="n"&gt;compliance_checking_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;corporate-compliance-rules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Encapsulates corporate compliance requirements for contract review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Knowledge content that will be injected into agent context
&lt;/span&gt;    &lt;span class="n"&gt;knowledge_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    ## Corporate Compliance Requirements

    All vendor contracts must be reviewed against these criteria:

    1. DATA HANDLING: Contracts involving PII must include:
       - Data processing addendum (DPA)
       - SOC 2 Type II certification requirement
       - Data residency clauses for EU customers (GDPR)
       - Breach notification timeline (max 72 hours)

    2. FINANCIAL TERMS: Flag for legal review if:
       - Auto-renewal clauses exceed 1 year
       - Liability caps below $1M for critical services
       - Payment terms shorter than Net 30
       - Price escalation clauses above 5% annually

    3. TERMINATION RIGHTS: Require:
       - Termination for convenience with 90-day notice
       - Immediate termination for material breach
       - Data return/deletion obligations post-termination

    4. INDEMNIFICATION: Standard requirements:
       - Mutual indemnification for IP infringement
       - Vendor indemnification for data breaches caused by vendor
       - Carve-outs for gross negligence
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Configuration parameters that can be customized per-organization
&lt;/span&gt;    &lt;span class="n"&gt;config_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;liability_threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto_renewal_max_years&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;breach_notification_hours&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;72&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="c1"&gt;# Tags for discoverability in the Skill registry
&lt;/span&gt;    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contracts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendor-management&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register the Skill with Fleet
&lt;/span&gt;&lt;span class="n"&gt;registered_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compliance_checking_skill&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skill registered with ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;registered_skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skill_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define the agent's state structure
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContractReviewState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;contract_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;compliance_issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;risk_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;skill_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;  &lt;span class="c1"&gt;# Injected by Fleet at runtime
&lt;/span&gt;
&lt;span class="c1"&gt;# Build the LangGraph workflow for contract review
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_data_handling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContractReviewState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check contract against data handling requirements from Skill.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Skill context is automatically injected by Fleet
&lt;/span&gt;    &lt;span class="n"&gt;skill_rules&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knowledge_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Your LLM call here would use skill_rules in the prompt
&lt;/span&gt;    &lt;span class="c1"&gt;# This is where domain expertise from the Skill gets applied
&lt;/span&gt;    &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Simplified example - real implementation would use LLM
&lt;/span&gt;    &lt;span class="n"&gt;contract&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contract_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data processing addendum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;contract&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dpa&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;contract&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATA_HANDLING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing Data Processing Addendum (DPA)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remediation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Request DPA from vendor before signing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_financial_terms&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContractReviewState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check financial terms against Skill-defined thresholds.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;skill_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;liability_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skill_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;liability_threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="c1"&gt;# Implementation would parse contract and check against thresholds
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_risk_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ContractReviewState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Aggregate issues into overall risk score.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance_issues&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;high_severity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;medium_severity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;severity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEDIUM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Risk score: 0-100, higher = more risk
&lt;/span&gt;    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;high_severity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;medium_severity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;recommendation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;APPROVE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REVIEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;recommendation&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Construct the graph
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContractReviewState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_handling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_data_handling&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financial_terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_financial_terms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_calculation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate_risk_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_handling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financial_terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financial_terms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_calculation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_handling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_finish_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_calculation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Attach the Skill to the agent identity
# This creates the binding between agent and reusable knowledge
&lt;/span&gt;&lt;span class="n"&gt;skill_attachment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillAttachment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;skill_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;registered_skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;skill_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;version_constraint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;~1.2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Accept 1.2.x patches automatically
&lt;/span&gt;    &lt;span class="n"&gt;config_overrides&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;liability_threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Org-specific override
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;attach_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill_attachment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skill attached to agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure permissions for the agent
# RBAC model controls who can view, edit, deploy, or retire
&lt;/span&gt;&lt;span class="n"&gt;permissions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PermissionSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="c1"&gt;# Legal ops team has full control
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group:legal-ops&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# General counsel can view and deploy
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group:general-counsel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# All legal staff can view metrics and outputs
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group:legal-all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# ML platform team can deploy but not modify prompts
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group:ml-platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deployer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="c1"&gt;# Compliance team needs read access for audits
&lt;/span&gt;        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group:compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_permissions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Permissions configured&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Register the compiled graph with the agent identity
# This connects the LangGraph definition to the Fleet identity
&lt;/span&gt;&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Query the Fleet to verify setup
&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_agents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal-ops&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Agent: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Fleet ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Skills: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_agent_skills&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Permissions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_permissions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Example: Run the agent with Fleet context injection
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fleet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;registered_identity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fleet_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;contract_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        VENDOR SERVICES AGREEMENT
        This agreement between Acme Corp and BigCloud Inc...
        Payment terms: Net 15
        Auto-renewal: 3 years
        Liability cap: $500,000
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;staging&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Review Result:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Risk Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Recommendation: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;recommendation&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Issues Found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;compliance_issues&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This implementation demonstrates Fleet's key capabilities: persistent agent identity, Skills as reusable knowledge packages, and permission configuration that maps to organizational structure. The &lt;code&gt;skill_context&lt;/code&gt; injection pattern keeps domain expertise separate from orchestration logic while making it available where needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Governance at Fleet Scale
&lt;/h2&gt;

&lt;p&gt;Fleet-level observability transforms how you monitor agent portfolios. Instead of drilling into individual agent dashboards and mentally aggregating patterns, Fleet provides organizational views that answer questions like "which agents had the highest error rates this week?" or "which teams are consuming the most LLM tokens?"&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;aggregated metrics dashboards&lt;/a&gt; span invocation counts, error rates, latency distributions, and token consumption across your entire agent fleet. You can slice these metrics by team, deployment environment, or custom tags. When leadership asks "how much are we spending on AI agents in the legal department?", you can answer with actual data rather than estimates.&lt;/p&gt;

&lt;p&gt;Polly, LangSmith's AI assistant that &lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;reached general availability&lt;/a&gt; alongside Fleet, can analyze fleet-wide patterns and suggest optimizations. This isn't just a chatbot interface to your metrics—Polly can identify correlations across agents that humans miss: "Three agents in the legal-ops team show similar latency spikes every Monday morning, likely correlated with the weekly contract upload batch job."&lt;/p&gt;

&lt;p&gt;The correlation between agent identity and LangSmith traces enables per-agent performance drilling that was previously impossible. Every trace automatically tags with the Fleet identity, so you can see exactly how the contract-reviewer-v1 agent performed across all its invocations—not just one deployment, but the complete behavioral history. This longitudinal view reveals drift patterns: if an agent's error rate creeps up over weeks, you can correlate with Skill updates or permission changes that might explain the regression.&lt;/p&gt;

&lt;p&gt;Compliance reporting leverages the audit trail to generate reports showing which agents accessed which tools and when. For regulated industries, this isn't optional—&lt;a href="https://arxiv.org/pdf/2603.27075" rel="noopener noreferrer"&gt;legal frameworks are struggling to keep pace with agentic AI&lt;/a&gt;, and organizations that can demonstrate clear governance have an advantage. Fleet's event export integrates with enterprise monitoring systems like Datadog and Splunk, feeding agent lifecycle events into existing security and compliance workflows.&lt;/p&gt;

&lt;p&gt;Cost attribution by agent identity enables chargeback and budget planning at the organizational level. When the CFO asks why the AI infrastructure budget tripled, you can show exactly which teams and which agents drove the increase—and more importantly, what business value they delivered.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;If you're operating five or more agents in production, Fleet's identity system addresses a concrete debugging problem: the "which agent did this?" nightmare. When a customer reports unexpected behavior, you need to trace from the symptom back to a specific agent, version, and invocation. Without stable identities, this requires manual correlation across deployment logs, trace IDs, and team knowledge about what's running where.&lt;/p&gt;

&lt;p&gt;Skills reduce duplicated prompt engineering effort across your organization. If you've ever watched multiple teams independently solve the same problem—formatting API responses correctly, handling authentication retries, structuring outputs for downstream systems—you've experienced the knowledge fragmentation that Skills address. The investment in creating a Skill pays dividends across every agent that attaches it.&lt;/p&gt;

&lt;p&gt;The permission model becomes essential before you give non-engineering teams access to modify agent behavior. The trend toward &lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/6-core-capabilities-to-scale-agent-adoption-in-2026/" rel="noopener noreferrer"&gt;democratized agent creation&lt;/a&gt; is accelerating, and without permission controls, you face either a security nightmare or a bottleneck where engineers review every change. Fleet's RBAC model lets you give marketing the ability to tweak their content agent's prompts without giving them access to the production deployment pipeline.&lt;/p&gt;

&lt;p&gt;The migration path from Agent Builder is relatively smooth: existing agents automatically receive Fleet identities when you enable Fleet on your workspace. Skill extraction is manual—you'll need to identify reusable knowledge currently embedded in agent prompts and factor it into Skills. This is work worth doing regardless of Fleet, as it forces you to document tribal knowledge that currently exists only in specific prompts.&lt;/p&gt;

&lt;p&gt;Evaluate Fleet alongside whatever agent registry solution you currently have (or more likely, don't have). Many teams are using internal wikis, spreadsheets, or Notion pages to track agents—fine for two or three agents, increasingly untenable as portfolios grow. Fleet provides a programmatic registry with API access, which enables automation that informal registries can't support.&lt;/p&gt;

&lt;p&gt;Start with identity and permissions before investing heavily in Skills. Governance foundations enable safe skill sharing later; without them, Skills become another vector for uncontrolled changes propagating across your fleet. Get the audit trail and permission model in place first, then layer Skills on top once you've established who can modify what.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Agent Fleet Inventory and Skill Extraction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before you can benefit from Fleet, you need to understand what you're managing. This week's project is a comprehensive inventory of your current agent portfolio with identification of skill extraction candidates.&lt;/p&gt;

&lt;p&gt;Start by cataloging every agent your organization runs: the ones in production, the ones in staging that "will ship soon," and the forgotten experiments still consuming compute in someone's sandbox. For each agent, document: owner team, purpose, tools it accesses, data it processes, current deployment status, and rough monthly cost.&lt;/p&gt;

&lt;p&gt;Next, analyze the prompts across these agents looking for duplicated expertise. Common patterns include: API interaction conventions, output formatting requirements, domain-specific terminology definitions, error handling approaches, and citation/attribution styles. These are your Skill extraction candidates.&lt;/p&gt;

&lt;p&gt;Create one Skill from the most duplicated pattern you find. Define its knowledge content, configuration parameters, and tags. Attach it to two existing agents and verify they behave consistently. This gives you hands-on experience with the Skill model before you scale.&lt;/p&gt;

&lt;p&gt;Finally, draft an RBAC policy for your fleet. Which teams should be viewers, editors, deployers, or administrators for which agents? Map this to your existing identity provider groups. You don't need Fleet to implement the policy—having it documented means you're ready when Fleet permissions become available in your workspace.&lt;/p&gt;

&lt;p&gt;The deliverable is a Fleet readiness document: agent inventory, three skill extraction candidates with rough definitions, and an RBAC policy ready for implementation. This preparation work ensures you can adopt Fleet deliberately rather than reactively when the tooling matures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;March 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;LangChain Announces Enterprise Agentic AI Platform Built with NVIDIA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;On Agent Frameworks and Agent Observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/6-core-capabilities-to-scale-agent-adoption-in-2026/" rel="noopener noreferrer"&gt;6 core capabilities to scale agent adoption in 2026 - Microsoft&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2603.27075" rel="noopener noreferrer"&gt;Mind The Gap: How The Technical Mechanism Of Agentic AI Outpace Global Legal Frameworks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://arxiv.org/html/2604.05387v1" rel="noopener noreferrer"&gt;Data-Driven Function Calling Improvements in Large Language Models&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly: Intel-Google CPU Alliance, Meta's Proprietary Pivot, and the $7 Trillion Infrastructure Reality Check</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:04:55 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-intel-google-cpu-alliance-metas-proprietary-pivot-and-the-7-trillion-infrastructure-n4b</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-intel-google-cpu-alliance-metas-proprietary-pivot-and-the-7-trillion-infrastructure-n4b</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly: Intel-Google CPU Alliance, Meta's Proprietary Pivot, and the $7 Trillion Infrastructure Reality Check
&lt;/h1&gt;

&lt;p&gt;The battle lines in AI infrastructure are being redrawn this week. As GPU costs and power demands reach breaking points, we're seeing major players hedge their bets—Intel and Google doubling down on CPU-based inference, Big Tech signing nuclear power deals, and Meta surprising everyone by abandoning its open-source commitment for a proprietary model that crushes the competition. Meanwhile, Chinese AI continues its relentless advance with Zhipu AI's massive new open-source release.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intel and Google Forge Expanded Partnership to Double Down on AI-Optimized CPUs
&lt;/h2&gt;

&lt;p&gt;Intel and Google &lt;a href="https://www.reuters.com/business/intel-google-double-down-ai-cpus-with-expanded-partnership-2026-04-09/" rel="noopener noreferrer"&gt;announced an expanded partnership&lt;/a&gt; on April 9, 2026, signaling a significant strategic bet that CPU-based AI inference can offer a viable alternative to the GPU-dominated landscape that NVIDIA currently controls.&lt;/p&gt;

&lt;p&gt;The partnership focuses on developing AI-optimized CPU architectures specifically designed for inference workloads—the computationally intensive task of running trained models in production. While GPUs have dominated AI training and increasingly inference, the collaboration suggests both companies see an opportunity in the growing cost pressures facing enterprise AI deployments.&lt;/p&gt;

&lt;p&gt;The timing is notable. Data center operators are grappling with skyrocketing power consumption and GPU procurement costs that have become untenable for many organizations. CPUs, while historically slower for AI workloads, offer advantages in power efficiency, existing infrastructure compatibility, and procurement flexibility that enterprise customers increasingly value.&lt;/p&gt;

&lt;p&gt;For Intel, the partnership represents a lifeline in the AI hardware race where the company has struggled to compete with NVIDIA's dominance. For Google, which operates one of the world's largest inference infrastructures for services like Search and Gemini, diversifying beyond GPUs and its own TPUs makes strategic sense as AI becomes core to every product.&lt;/p&gt;

&lt;p&gt;The real question is whether optimized CPU inference can close the performance gap enough to matter for latency-sensitive applications—or whether this remains primarily a cost optimization play for batch workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Big Tech Pours Billions into Next-Gen Nuclear as AI Power Demands Explode
&lt;/h2&gt;

&lt;p&gt;The AI infrastructure buildout has collided head-on with physical reality: these systems need staggering amounts of electricity, and the tech industry is now &lt;a href="https://www.reuters.com/legal/litigation/big-tech-puts-financial-heft-behind-next-gen-nuclear-power-ai-demand-surges-2026-04-10/" rel="noopener noreferrer"&gt;putting financial heft behind next-gen nuclear power&lt;/a&gt; to secure it.&lt;/p&gt;

&lt;p&gt;Meta, xAI, and other hyperscalers are now targeting data center deployments requiring a combined 110 GW of power—roughly equivalent to the entire electricity consumption of Germany. NVIDIA CEO Jensen Huang has publicly estimated that each major AI infrastructure buildout requires a minimum $60 billion investment, a figure that doesn't even account for the long-term power generation infrastructure needed to sustain operations.&lt;/p&gt;

&lt;p&gt;Startups like Oklo are capitalizing on this demand, securing long-term power purchase agreements with tech customers desperate for clean, reliable baseload power. The attraction of nuclear is clear: unlike solar and wind, it provides consistent output regardless of weather, and next-generation small modular reactor designs promise faster deployment timelines than traditional nuclear plants.&lt;/p&gt;

&lt;p&gt;But the numbers remain daunting. As &lt;a href="https://www.reuters.com/commentary/breakingviews/ai-dreams-crash-into-stark-7-trln-reality-2026-04-07/" rel="noopener noreferrer"&gt;Reuters analysis notes&lt;/a&gt;, AI infrastructure ambitions are crashing into a $7 trillion reality check when accounting for sustainable power solutions. The gap between AI's appetite for compute and the planet's ability to power it sustainably is arguably the defining constraint of this technology era—one that neither algorithmic efficiency gains nor hardware improvements alone can solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meta Breaks from Open-Source Playbook with Proprietary Muse Spark Model
&lt;/h2&gt;

&lt;p&gt;In a move that caught the AI community off guard, Meta &lt;a href="https://venturebeat.com/technology/goodbye-llama-meta-launches-new-proprietary-ai-model-muse-spark-first-since" rel="noopener noreferrer"&gt;launched Muse Spark&lt;/a&gt;, its first proprietary model since the company committed to its open-weight Llama strategy. The shift represents a significant philosophical reversal for a company that had positioned open-source AI as a competitive moat against OpenAI and Google.&lt;/p&gt;

&lt;p&gt;The performance justification is substantial. Muse Spark achieved an Artificial Analysis Intelligence Index score of 52, nearly tripling the 18 scored by Llama 4 Maverick. On the CharXiv Reasoning benchmark, Muse Spark posted an 86.4, outperforming both Claude Opus 4.6's 65.3 and GPT-5.4's 82.8. These aren't incremental improvements—they suggest Meta has been holding back capabilities in its public releases.&lt;/p&gt;

&lt;p&gt;The timing provides context for the decision. Chinese models from Alibaba and DeepSeek now account for 41% of downloads on Hugging Face, effectively commoditizing the open-weights space that Meta pioneered. When your open-source strategy mainly benefits competitors with lower labor costs and fewer regulatory constraints, the calculus changes.&lt;/p&gt;

&lt;p&gt;The community reaction has been mixed. Some view this as a betrayal of Meta's open-source commitments; others see it as inevitable market maturation. What's undeniable is that Meta can no longer credibly position itself as the champion of AI democratization—and that the open-source vs. proprietary debate in AI is far more nuanced than partisans on either side admit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zhipu AI's GLM-5.1 Sets New Open-Source Benchmark with 754B Parameters
&lt;/h2&gt;

&lt;p&gt;Chinese AI lab Zhipu AI released &lt;a href="https://venturebeat.com/technology/ai-joins-the-8-hour-work-day-as-glm-ships-5-1-open-source-llm-beating-opus-4" rel="noopener noreferrer"&gt;GLM-5.1&lt;/a&gt;, a 754 billion parameter model that establishes new benchmarks for open-source AI capabilities. The model includes a 202,752 token context window—large enough to process substantial codebases or document collections in a single pass.&lt;/p&gt;

&lt;p&gt;The benchmark results are striking. GLM-5.1 achieved 95.3 on AIME 2026, a rigorous mathematics assessment, and 68.7 on CyberGym, a cybersecurity evaluation spanning 1,507 tasks. Perhaps most impressively, the model passed what Zhipu calls "Scenario 3"—building a functional Linux-style desktop environment from scratch within an 8-hour timeframe, demonstrating sustained agentic capability over extended task horizons.&lt;/p&gt;

&lt;p&gt;For practitioners, the deployment story matters as much as the benchmarks. GLM-5.1 is available for local deployment via vLLM, SGLang, and xLLM frameworks, meaning organizations with sufficient hardware can run frontier-class capabilities entirely on-premises. This addresses data sovereignty and cost concerns that limit enterprise adoption of API-based models.&lt;/p&gt;

&lt;p&gt;The release comes as Zhipu AI IPO'd at a $52.83 billion valuation, reflecting investor confidence in Chinese AI development. For Western AI labs, GLM-5.1 represents yet another data point in an uncomfortable trend: the open-source lead they once held has definitively shifted east, with implications for talent flows, regulatory frameworks, and the geopolitics of AI development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stalking Victim's Lawsuit Against OpenAI Raises New Questions About AI Safety Guardrails
&lt;/h2&gt;

&lt;p&gt;A lawsuit filed against OpenAI by a stalking victim presents one of the most concrete tests yet of AI company liability when their products facilitate real-world harm. The victim claims ChatGPT fueled her abuser's delusional fixation and that the company ignored her direct warnings about the situation.&lt;/p&gt;

&lt;p&gt;The case highlights the gap between AI safety commitments—the red-teaming, the RLHF, the constitutional AI principles—and the actual mechanisms available to prevent harm when someone reports ongoing abuse. What processes exist for a victim to flag that an AI system is being weaponized against them? How do AI companies triage such reports against the millions of support tickets they receive? The lawsuit suggests these systems are either inadequate or non-existent.&lt;/p&gt;

&lt;p&gt;From a legal perspective, the case could establish precedent for when AI companies become liable for foreseeable misuse. Section 230 protections that shield platforms from user-generated content may not apply when a company has specific knowledge of harmful use and fails to act. The plaintiff's direct warnings to OpenAI—if documented—could prove pivotal.&lt;/p&gt;

&lt;p&gt;For AI companies, this lawsuit should prompt immediate review of their harm reporting mechanisms. The abstract safety research that dominates AI ethics discussions matters less than whether a stalking victim can effectively communicate that your product is being used to terrorize her—and whether anyone at your company is empowered to act on that information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The theoretical foundations of agentic AI are rapidly solidifying into production-ready architectures. A comprehensive &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;arXiv paper&lt;/a&gt; now formalizes the evolution toward orchestrated multi-agent systems, proposing a reference architecture that cleanly separates cognitive reasoning, hierarchical memory, typed tool invocation, and embedded governance layers.&lt;/p&gt;

&lt;p&gt;Multi-agent coordination patterns have reached standardization maturity. Analysis of frameworks including CAMEL, AutoGen, MetaGPT, LangGraph, Swarm, and MAKER reveals four dominant patterns: chain (sequential), star (hub-and-spoke), mesh (peer-to-peer), and explicit workflow graphs. The &lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends guide&lt;/a&gt; provides implementation details for each, noting that production deployments increasingly favor DAG-based task graphs with content-addressed artifacts for agent collaboration and audit trails.&lt;/p&gt;

&lt;p&gt;On the tooling front, AWS Bedrock AgentCore has emerged as the enterprise-grade option, offering managed infrastructure for agent deployment at scale. CrewAI and LangGraph continue gaining traction for teams preferring role-based agent orchestration with more granular control. OpenAI's &lt;a href="https://openai.com/index/new-tools-for-building-agents/" rel="noopener noreferrer"&gt;Agents SDK&lt;/a&gt; (available in Python and TypeScript) has evolved to become provider-agnostic, with &lt;a href="https://developers.openai.com/blog/openai-for-developers-2025" rel="noopener noreferrer"&gt;documented paths&lt;/a&gt; for integrating non-OpenAI models—a notable concession to the multi-model reality of production systems.&lt;/p&gt;

&lt;p&gt;Research into &lt;a href="https://arxiv.org/html/2603.27075v1" rel="noopener noreferrer"&gt;agentic AI governance&lt;/a&gt; frameworks is accelerating, with particular focus on audit mechanisms for autonomous decision chains and liability attribution when agents operate across organizational boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Opens First Permanent London Office Amid UK Expansion Push
&lt;/h2&gt;

&lt;p&gt;OpenAI's decision to establish a permanent London office marks meaningful expansion beyond its San Francisco headquarters and signals serious intent in the European market. The move comes as UK enterprise demand for ChatGPT and API services has grown substantially, driven by financial services, healthcare, and government adoption.&lt;/p&gt;

&lt;p&gt;The timing coincides with ongoing regulatory uncertainty in the US market under the Trump administration, making geographic diversification strategically prudent. London offers access to European AI talent—particularly from universities like Imperial, Oxford, and Cambridge—without the regulatory complexity of establishing operations in EU member states post-Brexit.&lt;/p&gt;

&lt;p&gt;For UK enterprise customers, a local presence should mean faster sales cycles, easier procurement processes, and the relationship-building that remains essential for high-value B2B deals. It also positions OpenAI for potential UK government contracts that often require local presence or data residency.&lt;/p&gt;

&lt;p&gt;The office joins an increasingly competitive London AI scene that includes Google DeepMind's headquarters, Anthropic's growing European team, and numerous well-funded startups. Whether this creates a talent war that benefits workers or simply redistributes the same limited pool of experienced AI engineers remains to be seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Temporarily Bans OpenClaw Creator from Claude Access
&lt;/h2&gt;

&lt;p&gt;In an incident that underscores ongoing tensions between AI companies and the developer ecosystem building on their platforms, Anthropic temporarily blocked the creator of OpenClaw from accessing Claude. OpenClaw is a tool that programmatically interfaces with Claude's capabilities.&lt;/p&gt;

&lt;p&gt;The ban highlights the unclear boundaries of acceptable use for API customers. AI companies want developers building applications on their platforms—it's a significant revenue and ecosystem play—but grow concerned when those applications automate access in ways that stress infrastructure or circumvent rate limits. The challenge is that the line between "creative developer" and "problematic automation" often depends on scale and intent rather than technical implementation.&lt;/p&gt;

&lt;p&gt;This follows a broader pattern of AI companies tightening controls on programmatic access and wrapper applications. OpenAI has similarly cracked down on projects it views as competitive or abusive, creating uncertainty for developers investing in AI-dependent products.&lt;/p&gt;

&lt;p&gt;For the developer community, these incidents raise legitimate concerns about platform risk. Building a business on AI APIs means accepting that your access can be revoked with limited recourse or explanation. The OpenClaw creator's ban was temporary, but the precedent matters—and suggests developers should maintain fallback options across multiple providers wherever possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The Intel-Google partnership will face its first real test when benchmark results for AI-optimized CPU inference emerge—expect performance comparisons against NVIDIA's latest within the quarter. The nuclear power agreements signal a 3-5 year buildout cycle that will determine whether AI scaling continues or hits hard physical limits. And Meta's proprietary pivot suggests the next Llama release may be significantly more restrictive, potentially fragmenting the open-source AI community that coalesced around previous versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/intel-google-double-down-ai-cpus-with-expanded-partnership-2026-04-09/" rel="noopener noreferrer"&gt;Intel and Google to double down on AI CPUs with expanded partnership&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/commentary/breakingviews/ai-dreams-crash-into-stark-7-trln-reality-2026-04-07/" rel="noopener noreferrer"&gt;AI dreams crash into stark $7 trln reality&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/legal/litigation/big-tech-puts-financial-heft-behind-next-gen-nuclear-power-ai-demand-surges-2026-04-10/" rel="noopener noreferrer"&gt;Big Tech puts financial heft behind next-gen nuclear power as AI demand surges&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/technology/goodbye-llama-meta-launches-new-proprietary-ai-model-muse-spark-first-since" rel="noopener noreferrer"&gt;Goodbye, Llama? Meta launches new proprietary AI model Muse Spark&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/technology/ai-joins-the-8-hour-work-day-as-glm-ships-5-1-open-source-llm-beating-opus-4" rel="noopener noreferrer"&gt;AI joins the 8-hour work day as GLM ships 5.1 open source LLM beating Opus 4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;The Evolution of Agentic AI Software Architecture - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends - Implementation Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.27075v1" rel="noopener noreferrer"&gt;How the Technical Mechanisms of Agentic AI Outpace Global Legal Frameworks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/new-tools-for-building-agents/" rel="noopener noreferrer"&gt;New tools for building agents - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://developers.openai.com/blog/openai-for-developers-2025" rel="noopener noreferrer"&gt;OpenAI for Developers in 2025&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>NVIDIA-Accelerated LangGraph — Parallel and Speculative Execution for Production Agents</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:03:13 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/nvidia-accelerated-langgraph-parallel-and-speculative-execution-for-production-agents-4mg6</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/nvidia-accelerated-langgraph-parallel-and-speculative-execution-for-production-agents-4mg6</guid>
      <description>&lt;h1&gt;
  
  
  NVIDIA-Accelerated LangGraph — Parallel and Speculative Execution for Production Agents
&lt;/h1&gt;

&lt;p&gt;Your multi-step research agent takes 12 seconds to respond. Users are bouncing. You've optimized prompts, cached embeddings, and upgraded to faster models—yet the fundamental problem remains: sequential LLM calls compound latency in ways that no single-node optimization can fix. The &lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;LangChain-NVIDIA enterprise partnership&lt;/a&gt; announced in March 2026 addresses this head-on with compile-time execution strategies that analyze your graph structure and automatically parallelize independent operations. This isn't about writing faster code—it's about declaring your intent and letting the compiler find the optimal execution path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Latency Problem in Multi-Step Agent Workflows
&lt;/h2&gt;

&lt;p&gt;Production agent systems rarely accomplish meaningful work with a single LLM call. A typical research agent might search the web, retrieve relevant documents, synthesize findings, evaluate completeness, and either iterate or produce a final answer. Each node in this workflow adds 500ms to 2 seconds of latency, depending on model size, context length, and inference provider. A five-node graph with one conditional loop easily hits 8-15 seconds—an eternity for interactive applications.&lt;/p&gt;

&lt;p&gt;The frustrating reality is that many of these operations could run simultaneously. Your web search doesn't depend on your document retrieval. Your conditional branches represent alternative futures that could both be computed before you know which path is correct. Yet traditional LangGraph execution respects the topological ordering of your graph, running nodes one after another even when the dependency structure allows parallelism.&lt;/p&gt;

&lt;p&gt;You might reach for &lt;code&gt;asyncio.gather()&lt;/code&gt; to manually parallelize, but this creates its own problems. State management becomes your responsibility. Reducer conflicts when parallel nodes write to the same state key need explicit handling. Rollback semantics for failed branches require careful coordination. The &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;evolution of agentic AI architectures&lt;/a&gt; has highlighted that these orchestration concerns consume significant engineering effort that should instead go toward domain logic.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;langchain-nvidia&lt;/code&gt; package addresses this systematically. Rather than requiring you to restructure your graph or manually manage concurrency, it analyzes your StateGraph at compile time and produces an optimized execution plan. The promise: 40-60% latency reduction on complex graphs without touching your node logic or edge definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the NVIDIA Execution Strategies Work Under the Hood
&lt;/h2&gt;

&lt;p&gt;When you compile a StateGraph with NVIDIA execution strategies enabled, the compiler performs a static analysis pass that builds a dependency DAG from your node and edge definitions. This analysis identifies which nodes read which state keys, which nodes write to which keys, and which edges create hard sequencing requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parallel execution&lt;/strong&gt; batches nodes that have no data dependencies between them. If your &lt;code&gt;search_web&lt;/code&gt; node only reads &lt;code&gt;query&lt;/code&gt; and writes &lt;code&gt;search_results&lt;/code&gt;, while your &lt;code&gt;retrieve_documents&lt;/code&gt; node reads &lt;code&gt;query&lt;/code&gt; and writes &lt;code&gt;retrieved_docs&lt;/code&gt;, these can execute concurrently—they touch disjoint portions of state. The compiler emits an execution plan that groups such nodes into parallel batches, using either asyncio coroutines or thread pools depending on your nodes' implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speculative execution&lt;/strong&gt; goes further by running both branches of conditional edges before the routing function resolves. Consider a conditional edge that routes to either &lt;code&gt;continue_research&lt;/code&gt; or &lt;code&gt;generate_answer&lt;/code&gt; based on a quality check. Traditionally, you'd wait for the quality check, then invoke the selected branch. With speculation, both branches begin executing immediately. Once the routing function returns, the "wrong" branch is terminated and its state changes discarded.&lt;/p&gt;

&lt;p&gt;This differs fundamentally from naive &lt;code&gt;asyncio.gather()&lt;/code&gt; parallelization. The NVIDIA optimizer handles &lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;state merging automatically&lt;/a&gt;, applying reducers in dependency order even when writes arrive out of sequence. For speculative branches, it maintains state snapshots that can be rolled back without corrupting your primary state. Failed branches don't leave partial state mutations behind.&lt;/p&gt;

&lt;p&gt;Memory overhead is the primary trade-off. Speculative branches duplicate the entire state snapshot at branch entry. For graphs with large state objects—say, a &lt;code&gt;messages&lt;/code&gt; list containing hundreds of conversation turns—this duplication can be expensive. The compiler provides heuristics, but you may need to annotate branches where speculation isn't worth the memory cost.&lt;/p&gt;

&lt;p&gt;For full GPU acceleration, the optimizer integrates with &lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;NVIDIA NIM microservices&lt;/a&gt; to batch inference requests from parallel nodes. If you're running Nemotron models through NIM, multiple parallel LLM calls can be batched into a single GPU kernel launch, further reducing overhead. This is where the really dramatic speedups come from—not just concurrent execution, but fused inference at the hardware level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enabling NVIDIA Optimizations in Your Existing LangGraph
&lt;/h2&gt;

&lt;p&gt;Getting started requires installing the &lt;code&gt;langchain-nvidia&lt;/code&gt; package alongside your existing LangGraph setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langchain-nvidia&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;0.2.0 langgraph&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;0.3.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're targeting full GPU acceleration (not just CPU-based parallelism), you'll also need CUDA 12.x and the NIM client libraries. For CPU-only environments, the optimizer falls back to thread pool parallelism—slower than GPU batching but still significantly faster than sequential execution.&lt;/p&gt;

&lt;p&gt;The integration surfaces through new keyword arguments on the &lt;code&gt;compile()&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_nvidia&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NVIDIAExecutionStrategy&lt;/span&gt;

&lt;span class="c1"&gt;# Your existing graph definition
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieve_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{...})&lt;/span&gt;

&lt;span class="c1"&gt;# Compile with NVIDIA optimizations
&lt;/span&gt;&lt;span class="n"&gt;compiled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;execution_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NVIDIAExecutionStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARALLEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speculative_branches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speculation_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="c1"&gt;# Max nested speculation levels
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compiler's auto-detection works well for most graphs, but sometimes state dependencies are implicit or dynamic. You can hint parallelizability with the &lt;code&gt;@independent&lt;/code&gt; decorator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_nvidia&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;independent&lt;/span&gt;

&lt;span class="nd"&gt;@independent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;writes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Explicitly declares state access pattern
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;search_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For conditional edges where one branch has side effects—database writes, external API calls with rate limits, or any non-idempotent operation—you can opt out of speculation per-edge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;route_function&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_more&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_to_database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Has side effects
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;speculative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Debugging the execution plan is crucial for understanding what the optimizer actually produces. The &lt;code&gt;explain_execution_plan()&lt;/code&gt; method prints a human-readable DAG with timing estimates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compiled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;explain_execution_plan&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Output:
# Batch 1 (parallel): [search_node, retrieve_node] ~800ms
# Batch 2 (sequential): [synthesize_node] ~1200ms
# Conditional (speculative): [research_more | final_answer] ~600ms (one discarded)
# Estimated total: 2600ms (vs 4800ms sequential)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One critical gotcha: speculative execution and LangGraph's &lt;code&gt;interrupt()&lt;/code&gt; mechanism don't mix. If a node might raise an interrupt to request human input, that node and all nodes depending on its output must execute sequentially. The compiler enforces this, emitting a warning when it detects potential conflicts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's build a multi-tool research agent that demonstrates both parallel and speculative execution. This agent searches the web, retrieves documents from a vector store, synthesizes findings, and conditionally loops back for more research or produces a final answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Research Agent with NVIDIA-Optimized Execution
Demonstrates parallel node batching and speculative conditional execution
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_nvidia&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NVIDIAExecutionStrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;independent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TavilySearchResults&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.documents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;


&lt;span class="c1"&gt;# Define agent state with explicit reducers for parallel safety
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Reducer handles parallel writes
&lt;/span&gt;    &lt;span class="n"&gt;retrieved_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;iteration_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;final_answer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;


&lt;span class="c1"&gt;# Initialize components
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;search_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TavilySearchResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Node 1: Web Search (parallelizable with document retrieval)
&lt;/span&gt;&lt;span class="nd"&gt;@independent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;writes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Searches the web for relevant information.
    Marked @independent because it only reads &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; and writes &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
    allowing parallel execution with other nodes that don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t touch these keys.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract content strings from search results
&lt;/span&gt;    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# Node 2: Document Retrieval (parallelizable with web search)
&lt;/span&gt;&lt;span class="nd"&gt;@independent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reads&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;writes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Retrieves relevant documents from vector store.
    Runs in parallel with search_web since they access disjoint state keys.
    In production, this would query Pinecone/Chroma/etc.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simulated retrieval - replace with actual vector store call
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate retrieval latency
&lt;/span&gt;    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved context for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                 &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vector_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# Node 3: Synthesis (depends on search_results and retrieved_docs)
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize_findings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Synthesizes all gathered information into a coherent summary.
    Must run after parallel nodes complete since it reads their outputs.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Combine all sources
&lt;/span&gt;    &lt;span class="n"&gt;all_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;doc_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;page_content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on the following research for query &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:

Web Search Results:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;all_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Retrieved Documents:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;doc_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Provide a synthesis of the key findings. Note any gaps that require further research.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# Node 4: Continue Research (speculative branch - may be discarded)
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;continue_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generates a refined query for additional research.
    This node runs speculatively alongside generate_answer.
    If routing selects generate_answer, this branch is discarded.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The current synthesis has gaps: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Generate a refined search query to fill these gaps.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Updates query for next iteration
&lt;/span&gt;

&lt;span class="c1"&gt;# Node 5: Generate Final Answer (speculative branch - may be discarded)
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Produces the final answer from synthesized research.
    Runs speculatively - discarded if routing continues research instead.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on this research synthesis:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Provide a comprehensive final answer to: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# Routing function for conditional edge
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Determines whether research is sufficient or needs another iteration.
    Both branches execute speculatively before this function returns.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Simple heuristic: max 3 iterations, or check synthesis quality
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# In production, use LLM to evaluate synthesis completeness
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;further research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="c1"&gt;# Build the graph
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_research_graph&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add nodes
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieve_documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_findings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;continue_research&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add edges - search and retrieve can run in parallel
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Both start from START = parallel
&lt;/span&gt;
    &lt;span class="c1"&gt;# Both must complete before synthesis
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Conditional edge with speculative execution
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Loop back or end
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;continue_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;


&lt;span class="c1"&gt;# Compile with NVIDIA optimizations
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_research_graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Baseline compilation (sequential execution)
&lt;/span&gt;&lt;span class="n"&gt;baseline_compiled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Optimized compilation with parallel + speculative execution
&lt;/span&gt;&lt;span class="n"&gt;optimized_compiled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;execution_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NVIDIAExecutionStrategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARALLEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speculative_branches&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;speculation_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Benchmarking harness
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;compiled_graph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Measures end-to-end latency across multiple runs.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;latencies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;compiled_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the latest advances in quantum computing?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieved_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
        &lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;p50&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;p95&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: avg=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s, p50=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p50&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s, p95=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p95&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;latencies&lt;/span&gt;


&lt;span class="c1"&gt;# Run comparison
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Print execution plan for optimized graph
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Optimized Execution Plan ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimized_compiled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;explain_execution_plan&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Run benchmarks
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Benchmark Results (50 runs each) ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;baseline_latencies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseline_compiled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Baseline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;optimized_latencies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimized_compiled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Optimized&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate improvement
&lt;/span&gt;    &lt;span class="n"&gt;baseline_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseline_latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseline_latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;optimized_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimized_latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimized_latencies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;improvement&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;baseline_avg&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;optimized_avg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;baseline_avg&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Latency reduction: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;improvement&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run this benchmark, you'll see the execution plan clearly showing how &lt;code&gt;search_web&lt;/code&gt; and &lt;code&gt;retrieve_docs&lt;/code&gt; batch together in a parallel group, followed by &lt;code&gt;synthesize&lt;/code&gt;, then the speculative conditional where both &lt;code&gt;continue_research&lt;/code&gt; and &lt;code&gt;generate_answer&lt;/code&gt; start simultaneously. In &lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;LangSmith traces&lt;/a&gt;, the parallel batch appears as overlapping spans rather than sequential blocks—a visual confirmation that optimization is working.&lt;/p&gt;

&lt;p&gt;Expected results on this 4-node graph with one conditional branch: approximately 45% latency reduction compared to baseline sequential execution. The exact improvement depends on your inference provider's concurrency support and network conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use (and When to Avoid) Speculative Execution
&lt;/h2&gt;

&lt;p&gt;Speculative execution is powerful but not free. Every speculative branch consumes compute resources—tokens, GPU cycles, API calls—for work that may be discarded. Understanding when speculation pays off is critical for production deployments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ideal scenarios for speculation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Read-only branches perform well because discarding them has no lasting effects. A branch that only runs inference and updates local state can be safely abandoned mid-execution. Similarly, idempotent operations—where running twice produces the same result as running once—are safe to speculate because even if partial work persists, it doesn't corrupt your system.&lt;/p&gt;

&lt;p&gt;Low-cost nodes are obvious candidates. If both branches of a conditional complete in under 500ms, speculating costs you at most 500ms of parallel compute. When routing takes 200ms to resolve, you've paid a small premium for potentially significant latency reduction.&lt;/p&gt;

&lt;p&gt;Cached or pre-computed results make speculation nearly free. If one branch just reads from a cache while the other runs full inference, speculating on the cached branch adds negligible overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where speculation hurts:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;External writes are the biggest red flag. A branch that writes to a database, sends an email, or calls a billing API should never run speculatively. Even if you "discard" the branch result, the side effect has already occurred. The framework can't undo your Stripe charge.&lt;/p&gt;

&lt;p&gt;API rate limits compound the problem. If you're calling a third-party API with per-minute quotas, speculative branches double your request rate on conditional paths. During the &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;agentic AI architecture evolution&lt;/a&gt;, practitioners have learned that rate limit exhaustion often manifests as cascading failures rather than graceful degradation.&lt;/p&gt;

&lt;p&gt;Vastly asymmetric branch costs make speculation inefficient. If one branch takes 200ms and the other takes 5 seconds, speculating on the expensive branch when the cheap branch would have been selected wastes 5 seconds of compute. The framework can't predict routing outcomes, so it speculates on both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost analysis template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ROI calculation for speculative execution on a conditional edge
&lt;/span&gt;&lt;span class="n"&gt;cheap_branch_cost_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
&lt;span class="n"&gt;expensive_branch_cost_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;
&lt;span class="n"&gt;routing_probability_cheap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;  &lt;span class="c1"&gt;# 70% of traffic takes cheap branch
&lt;/span&gt;&lt;span class="n"&gt;routing_probability_expensive&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;

&lt;span class="c1"&gt;# Without speculation: expected tokens per request
&lt;/span&gt;&lt;span class="n"&gt;baseline_expected_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cheap_branch_cost_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;routing_probability_cheap&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;expensive_branch_cost_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;routing_probability_expensive&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# = 500 * 0.7 + 3000 * 0.3 = 350 + 900 = 1250 tokens
&lt;/span&gt;
&lt;span class="c1"&gt;# With speculation: always pay for both branches
&lt;/span&gt;&lt;span class="n"&gt;speculative_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cheap_branch_cost_tokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;expensive_branch_cost_tokens&lt;/span&gt;
&lt;span class="c1"&gt;# = 500 + 3000 = 3500 tokens
&lt;/span&gt;
&lt;span class="c1"&gt;# Speculation costs 2.8x more tokens
# Only worth it if latency reduction value exceeds 2.8x token cost increase
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hybrid approaches work best.&lt;/strong&gt; Mark expensive or side-effect-producing branches as non-speculative while allowing cheap, read-only branches to speculate freely. The per-edge &lt;code&gt;speculative&lt;/code&gt; parameter gives you this granularity.&lt;/p&gt;

&lt;p&gt;One subtle interaction: &lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;LangGraph's checkpointing mechanism&lt;/a&gt; doesn't persist speculative branch state until routing resolves. This is usually what you want—failed speculation shouldn't pollute your checkpoint history. However, it means you can't resume from a mid-speculation checkpoint if the process crashes. For long-running agents where crash recovery matters, consider whether speculation's latency benefits outweigh the recovery complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;The NVIDIA execution optimization represents a philosophical shift in how we build agent systems. Instead of meticulously hand-tuning async boundaries and managing concurrent state, you declare your node dependencies and let the compiler find parallelism. This is the same trajectory that took SQL from procedural cursor loops to declarative queries optimized by the database engine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Immediate wins for existing deployments:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you have production LangGraph agents today, you can often get meaningful speedups without any refactoring. Install &lt;code&gt;langchain-nvidia&lt;/code&gt;, add the execution strategy flags to your &lt;code&gt;compile()&lt;/code&gt; call, and run &lt;code&gt;explain_execution_plan()&lt;/code&gt; to see what the optimizer finds. Many real-world graphs have latent parallelism—nodes that happen to be defined sequentially but don't actually depend on each other's outputs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.langchain.com/customers-kensho/" rel="noopener noreferrer"&gt;Kensho's multi-agent framework&lt;/a&gt; is a good case study here. Their financial research agents had multiple data-gathering nodes that were functionally independent but executed sequentially due to how the graph was originally authored. Adding parallel execution dropped their median latency by 38% with zero changes to node implementations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full GPU acceleration requires &lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;NVIDIA NIM microservices&lt;/a&gt; running somewhere your agents can reach them—either self-hosted on GPU instances or via NVIDIA's cloud offerings. This adds infrastructure complexity but enables inference batching that further compounds the parallel execution benefits.&lt;/p&gt;

&lt;p&gt;For teams not ready to operate NIM infrastructure, CPU-only fallback still provides parallel execution via thread pools. You lose the GPU batching speedups but retain the concurrent node execution benefits. This is a reasonable starting point for evaluation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-benefit for cloud deployments:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Parallel execution increases your instantaneous compute footprint. Instead of one inference call at a time, you might have three or four. On cloud GPU instances priced by the hour, this doesn't increase cost—you're already paying for the GPU. On API-based providers priced per token, parallel execution is cost-neutral (same tokens, just concurrent). Speculative execution, however, genuinely increases token spend on conditional paths.&lt;/p&gt;

&lt;p&gt;Run the numbers for your specific traffic patterns. A 45% latency reduction might justify a 20% increase in token costs for latency-sensitive applications. For batch processing where latency doesn't matter, speculation may not make sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migration path:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start with parallel execution only (&lt;code&gt;speculative_branches=False&lt;/code&gt;). This is nearly always safe and provides immediate benefits. Monitor your &lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;LangSmith traces&lt;/a&gt; for the parallel execution patterns, validate that state merging works correctly for your reducers, and measure the actual latency improvement.&lt;/p&gt;

&lt;p&gt;Once comfortable, enable speculative execution on specific conditional edges where both branches are cheap and side-effect-free. Use the per-edge &lt;code&gt;speculative&lt;/code&gt; parameter rather than the global flag. This incremental approach lets you learn where speculation helps in your specific workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;New observability metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangSmith's integration with the NVIDIA execution strategies surfaces two new metrics worth monitoring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speculative waste ratio&lt;/strong&gt;: Percentage of speculative branch compute that gets discarded. High values (&amp;gt;50%) suggest your speculation targets are poorly chosen.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel efficiency score&lt;/strong&gt;: Ratio of achieved parallelism to theoretical maximum. Low values indicate state dependencies you might be able to refactor away.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking ahead, the &lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;LangChain roadmap&lt;/a&gt; hints at auto-tuning capabilities that learn optimal speculation targets from production traces. The vision: your agent framework observes which branches win routing decisions, estimates branch costs, and automatically adjusts speculation settings to minimize expected latency given observed traffic patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Latency-Optimized Customer Support Agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a customer support agent that handles incoming tickets by: (1) classifying the ticket category, (2) searching knowledge base documentation, (3) retrieving similar past tickets, and (4) either generating a draft response or escalating to a human—with the escalation/response decision made via conditional routing.&lt;/p&gt;

&lt;p&gt;Implementation steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define a StateGraph with four main nodes plus the conditional branch&lt;/li&gt;
&lt;li&gt;Annotate the knowledge base search and past ticket retrieval nodes as &lt;code&gt;@independent&lt;/code&gt;—they read the same input but write to different state keys&lt;/li&gt;
&lt;li&gt;Mark the "escalate to human" branch as &lt;code&gt;speculative=False&lt;/code&gt; since escalation triggers external notifications&lt;/li&gt;
&lt;li&gt;Allow the "draft response" branch to speculate since it's a pure LLM call&lt;/li&gt;
&lt;li&gt;Implement &lt;code&gt;explain_execution_plan()&lt;/code&gt; logging to verify optimization&lt;/li&gt;
&lt;li&gt;Build a test harness that submits 100 tickets and compares baseline vs. optimized latency distributions&lt;/li&gt;
&lt;li&gt;Calculate your actual ROI: latency reduction vs. token cost increase from speculation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Target metrics: 35-50% latency reduction on the happy path (response generation), with escalation paths unaffected by speculation. If your support tickets have 80/20 response/escalation ratio, speculation should have positive ROI even with some wasted compute on the 20% that escalates.&lt;/p&gt;

&lt;p&gt;This project directly applies to production use cases and gives you hands-on experience with the optimization trade-offs before deploying to real traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/nvidia-enterprise/" rel="noopener noreferrer"&gt;LangChain Announces Enterprise Agentic AI Platform Built ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/march-2026-langchain-newsletter/" rel="noopener noreferrer"&gt;March 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/on-agent-frameworks-and-agent-observability/" rel="noopener noreferrer"&gt;On Agent Frameworks and Agent Observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.com/customers-kensho/" rel="noopener noreferrer"&gt;How Kensho built a multi-agent framework with LangGraph ...&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;The Evolution of Agentic AI Software Architecture&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>Primitive Shifts: The Async Task Primitive</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:02:23 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-async-task-primitive-1aih</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-async-task-primitive-1aih</guid>
      <description>&lt;h1&gt;
  
  
  Primitive Shifts: The Async Task Primitive
&lt;/h1&gt;

&lt;p&gt;Every few months, the baseline of how AI systems work quietly moves. Engineers who noticed early weren't smarter — they were just paying attention to the right signals. The engineers who saw containerization coming didn't predict Docker's dominance; they noticed deployment friction patterns that VMs couldn't solve. The ones who caught the serverless shift weren't visionaries; they were tired of capacity planning for bursty workloads. Right now, the same kind of quiet shift is happening in agent orchestration — and if you're still building synchronous agent calls, you're about to feel the floor move.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is It?
&lt;/h2&gt;

&lt;p&gt;The shift is from synchronous agent invocations — where you call an agent, block, and wait for a response — to &lt;strong&gt;task-based asynchronous primitives&lt;/strong&gt; with first-class status tracking, timeout budgets, and structured resumption. The core pattern: agent calls return a task identifier immediately, the client polls or subscribes for status updates, and the agent can run for minutes to hours without blocking the caller.&lt;/p&gt;

&lt;p&gt;This isn't just "background jobs for AI." The &lt;a href="https://arxiv.org/html/2603.13417" rel="noopener noreferrer"&gt;design patterns research for deploying AI agents&lt;/a&gt; describes protocol-level primitives that are emerging specifically for agent orchestration: task TTLs (typically 15 minutes for orphaned tasks), adaptive timeout budgeting that propagates through multi-agent hierarchies, and structured error semantics that go far beyond HTTP status codes. When a task fails, you don't just get a 500 — you get typed failures like &lt;code&gt;budget_exhausted&lt;/code&gt;, &lt;code&gt;upstream_unavailable&lt;/code&gt;, or &lt;code&gt;partial_completion&lt;/code&gt; with resumable checkpoints.&lt;/p&gt;

&lt;p&gt;AWS Bedrock AgentCore already ships this pattern in production. The MCP specification is converging toward what's being called the "tasks pattern" as the standard approach for non-trivial agent interactions. But the key insight is architectural: in multi-agent systems where planner agents invoke specialist agents, synchronous blocking is &lt;a href="https://arxiv.org/html/2603.13417" rel="noopener noreferrer"&gt;guaranteed to fail against planner timeout budgets&lt;/a&gt;. The planner has 10 minutes to coordinate five specialists. If any specialist blocks for 8 minutes, the planner can't aggregate results before its own deadline. The math doesn't work.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;agentic AI architecture research&lt;/a&gt; frames this as moving from "orchestration through blocking calls" to "orchestration through task lifecycle management." It's the difference between a conductor waiting for each musician to finish before cueing the next, versus a conductor who starts all sections and monitors their progress in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Flying Under the Radar
&lt;/h2&gt;

&lt;p&gt;Most agent tutorials and demos still show synchronous patterns that work perfectly for 30-second completions. The canonical example — "build an agent that searches the web and summarizes results" — completes in under a minute. The tutorials work. The YouTube videos work. The blog posts work. And engineers reasonably conclude that their production systems should look like the tutorials.&lt;/p&gt;

&lt;p&gt;But &lt;a href="https://www.anthropic.com/news/measuring-agent-autonomy" rel="noopener noreferrer"&gt;Anthropic's data on agent autonomy&lt;/a&gt; shows that Claude Code's 99th percentile turn durations have nearly doubled — from around 25 minutes to over 45 minutes — as the system handles increasingly complex tasks. The &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report&lt;/a&gt; confirms this trajectory: as agents take on more autonomous work, session lengths stretch. The ceiling you're building against isn't the 45-second median; it's the 45-minute tail that's growing faster than most architectures can accommodate.&lt;/p&gt;

&lt;p&gt;Engineers familiar with background job patterns — Celery, Sidekiq, Bull — assume they can retrofit existing infrastructure. But agent tasks need &lt;strong&gt;identity propagation&lt;/strong&gt; (who initiated this multi-hop request?), &lt;strong&gt;context-scoped routing&lt;/strong&gt; (which agent instance has the conversation state?), and &lt;strong&gt;structured continuation&lt;/strong&gt; (how do we resume from a checkpoint with partial results?). Traditional job queues give you "run this function later." Agent tasks need "run this function later, as this user, with this context, reporting status to this callback, resuming from this state if interrupted, and inheriting this timeout budget."&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2603.25100v1" rel="noopener noreferrer"&gt;research on separation of execution power in AI systems&lt;/a&gt; describes this as a fundamental architectural distinction: agent tasks aren't background jobs with extra metadata — they're a different primitive with different lifecycle guarantees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Try It Today
&lt;/h2&gt;

&lt;p&gt;The simplest starting point is implementing the MCP tasks pattern in your existing agent infrastructure. Here's a minimal Python implementation showing the core primitives — task creation, status polling, timeout budgeting, and structured errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# task_primitives.py
# A minimal implementation of async task primitives for agent orchestration
# Requires: pip install pydantic&amp;gt;=2.0 fastapi uvicorn
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Enum&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid4&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;PENDING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;           &lt;span class="c1"&gt;# Created, not yet started
&lt;/span&gt;    &lt;span class="n"&gt;RUNNING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;running&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;           &lt;span class="c1"&gt;# Currently executing
&lt;/span&gt;    &lt;span class="n"&gt;COMPLETED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;       &lt;span class="c1"&gt;# Finished successfully
&lt;/span&gt;    &lt;span class="n"&gt;FAILED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;            &lt;span class="c1"&gt;# Finished with error
&lt;/span&gt;    &lt;span class="n"&gt;TIMEOUT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;# Budget exhausted
&lt;/span&gt;    &lt;span class="n"&gt;PARTIAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;# Partial completion with checkpoint
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ErrorType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Structured error types beyond generic failures&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;BUDGET_EXHAUSTED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget_exhausted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;      &lt;span class="c1"&gt;# Timeout budget depleted
&lt;/span&gt;    &lt;span class="n"&gt;UPSTREAM_UNAVAILABLE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upstream_unavailable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Dependency failed
&lt;/span&gt;    &lt;span class="n"&gt;PARTIAL_COMPLETION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partial_completion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Can resume from checkpoint
&lt;/span&gt;    &lt;span class="n"&gt;CONTEXT_LOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context_lost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;             &lt;span class="c1"&gt;# State invalidated
&lt;/span&gt;
&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskCheckpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Resumable state for partial completions&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;step_completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;intermediate_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;remaining_work&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt; 
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Core task primitive with timeout budgeting&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
    &lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;  &lt;span class="c1"&gt;# Time remaining for this task
&lt;/span&gt;    &lt;span class="n"&gt;parent_task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# For hierarchical tracking
&lt;/span&gt;    &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskCheckpoint&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ErrorType&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;error_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# TTL for orphaned task cleanup (default 15 minutes per MCP pattern)
&lt;/span&gt;    &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;900.0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate remaining timeout budget&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;child_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reserve_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Budget to allocate to child tasks, reserving time for aggregation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;reserve_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;In-memory task store (use Redis/Postgres in production)&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_cleanup_task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;parent_task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a new task and return immediately (fire-and-forget pattern)&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PENDING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parent_task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parent_task_id&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;error_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskCheckpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Update task status with structured error semantics&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;error_type&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;error_message&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;checkpoint&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cleanup_expired&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Remove orphaned tasks past TTL&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;expired&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ttl_seconds&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PENDING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUNNING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expired&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# In production: log, emit metric, notify observers
&lt;/span&gt;            &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tasks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Example: Multi-agent orchestration with budget propagation
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;orchestrator_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total_budget_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;600.0&lt;/span&gt;  &lt;span class="c1"&gt;# 10 minutes
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Orchestrator that spawns specialist agents with propagated budgets.
    Demonstrates the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fire, track, resume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; pattern.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Create parent task immediately (fire)
&lt;/span&gt;    &lt;span class="n"&gt;parent_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_budget_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUNNING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Spawn child tasks with reduced budgets (propagate)
&lt;/span&gt;    &lt;span class="n"&gt;child_budget&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;child_budget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reserve_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;child_tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;specialist&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;timeout_budget_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;child_budget&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Split among specialists
&lt;/span&gt;            &lt;span class="n"&gt;parent_task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;child_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="c1"&gt;# In production: dispatch to actual agent workers
&lt;/span&gt;        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nf"&gt;simulate_specialist_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Poll for completion (track)
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;60.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Reserve aggregation time
&lt;/span&gt;        &lt;span class="n"&gt;all_done&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; 
            &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FAILED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;child_tasks&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;all_done&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Poll interval
&lt;/span&gt;
    &lt;span class="c1"&gt;# Aggregate results or handle partial completion (resume)
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;child_tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;child_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;child_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;child_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;child_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Use checkpoint for partial results
&lt;/span&gt;            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;specialist&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;child_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;intermediate_results&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child_tasks&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Aggregated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Partial completion - save checkpoint for resumption
&lt;/span&gt;        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ErrorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL_COMPLETION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TaskCheckpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;step_completed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;intermediate_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;remaining_work&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;child_tasks&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parent_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simulate_specialist_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TaskRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentTask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;specialist_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate specialist agent work with timeout awareness&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RUNNING&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check budget before starting work
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;FAILED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ErrorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BUDGET_EXHAUSTED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;error_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Insufficient budget to start work&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="c1"&gt;# Simulate work with periodic budget checks
&lt;/span&gt;    &lt;span class="n"&gt;work_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;work_duration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remaining_budget&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Save checkpoint before timeout
&lt;/span&gt;            &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ErrorType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PARTIAL_COMPLETION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TaskCheckpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;step_completed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;intermediate_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;specialist_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; interim&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="n"&gt;remaining_work&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finalize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;TaskStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;specialist_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This implementation shows the three key shifts from synchronous patterns: immediate task ID return instead of blocking, explicit budget propagation to child tasks, and structured checkpoints for partial completions. The &lt;a href="https://arxiv.org/html/2603.13417" rel="noopener noreferrer"&gt;Context-Aware Broker Pattern&lt;/a&gt; described in deployment research extends this further with identity propagation and six-stage routing — but this foundation gets you 80% of the architectural benefit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Synchronous agent wrappers become technical debt.&lt;/strong&gt; Every &lt;code&gt;await agent.complete(prompt)&lt;/code&gt; in your codebase is a blocking call that assumes sub-minute responses. The &lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;empirical study of AI coding agents&lt;/a&gt; documents failure modes that emerge specifically when synchronous assumptions break — cascading timeouts, lost context, and silent failures that only surface in logs hours later. The investment in task primitives pays off immediately for any agent interaction that might exceed your HTTP timeout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability requires task-native tracing.&lt;/strong&gt; Traditional request traces end when the HTTP connection closes. The &lt;a href="https://arxiv.org/html/2603.28735v1" rel="noopener noreferrer"&gt;research on architecture documentation for AI systems&lt;/a&gt; emphasizes that agent observability must span the full task lifecycle — creation, status transitions, checkpoint saves, and eventual completion or resumption. Your existing Datadog or Jaeger setup will show a request that returned quickly with a task ID, then nothing until the client polls. The actual work happens in a tracing blind spot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent orchestration needs explicit budget propagation.&lt;/strong&gt; Without passing remaining TTL to child agents, you get the scenario from our code example: a child agent consumes the entire timeout budget, and the parent fails to aggregate results before its own deadline. The &lt;a href="https://arxiv.org/html/2603.27296v1" rel="noopener noreferrer"&gt;multi-agent AI systems research&lt;/a&gt; shows this is the primary failure mode in hierarchical agent architectures — not model errors, but coordination timeouts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error handling shifts from "retry or fail" to "resume from checkpoint."&lt;/strong&gt; The structured error semantics in the code — &lt;code&gt;partial_completion&lt;/code&gt; with checkpoint state — reflect a fundamental change in how failures work. The &lt;a href="https://arxiv.org/html/2603.27249v1" rel="noopener noreferrer"&gt;growing burden of AI-assisted development&lt;/a&gt; notes that engineers spend increasing time recovering from partial completions rather than handling clean success/failure binaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Signal
&lt;/h2&gt;

&lt;p&gt;AWS Bedrock AgentCore shipping task patterns in production suggests AWS sees this as required infrastructure, not experimental nicety. When a major cloud provider builds timeout budgeting and structured resumption into their agent platform, they're responding to customer pain that's already widespread enough to justify platform investment.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2602.18764v2" rel="noopener noreferrer"&gt;schema-guided dialogue systems research&lt;/a&gt; describes enterprise deployments requiring what they call "CABP-style identity propagation" — context-aware broker patterns that synchronous architectures simply cannot support. The identity of who initiated a request must flow through every task hop; orphaned tasks with sensitive context need explicit cleanup policies. This is table-stakes for enterprise compliance, and it requires task primitives.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;AI agents ecosystem&lt;/a&gt; shows a clear pattern: agent frameworks are increasingly separating "tooling and infrastructure" from core agent logic. Task lifecycle is becoming a layer — like logging or metrics — that you import rather than build. The &lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;institutional knowledge primitives research&lt;/a&gt; frames this as inevitable: as agent capabilities grow, the operational complexity concentrates in lifecycle management rather than inference.&lt;/p&gt;

&lt;p&gt;Perhaps most telling: Anthropic's data shows the 99.9th percentile turn durations &lt;a href="https://www.anthropic.com/news/measuring-agent-autonomy" rel="noopener noreferrer"&gt;continue to climb&lt;/a&gt; as agents handle more complex autonomous work. Architectures built for minutes will hit hours. The question isn't whether you'll need task primitives — it's whether you'll adopt them proactively or reactively after a production incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shift Rating
&lt;/h2&gt;

&lt;p&gt;🟢 &lt;strong&gt;Adopt Now&lt;/strong&gt; — Teams building multi-agent systems or expecting agent turn durations beyond 2-3 minutes should implement task primitives immediately. The synchronous patterns that work today will fail silently as capabilities expand — not with clean errors, but with timeout cascades that surface as missing data and confused users.&lt;/p&gt;

&lt;p&gt;The refactoring cost grows with codebase size. Early adopters implement the pattern once in their agent abstraction layer. Late adopters retrofit dozens of call sites after production incidents reveal the architectural gap. The &lt;a href="https://arxiv.org/html/2602.10122v1" rel="noopener noreferrer"&gt;practical guide to agentic AI transition&lt;/a&gt; explicitly recommends treating task primitives as foundational infrastructure rather than optimization — not because current workloads require it, but because the ceiling is rising faster than codebases can reactively adapt.&lt;/p&gt;

&lt;p&gt;If you're still synchronously awaiting agent completions, start with the highest-duration calls. Wrap them in task primitives this month. When the floor moves — and &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;the data suggests it's moving soon&lt;/a&gt; — you'll already be standing on solid ground.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.13417" rel="noopener noreferrer"&gt;Design Patterns for Deploying AI Agents with Model Context Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.25100v1" rel="noopener noreferrer"&gt;From Logic Monopoly to Social Contract: Separation of Execution Power in AI Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;AI Skills as the Institutional Knowledge Primitive for Agentic Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.10122v1" rel="noopener noreferrer"&gt;A Practical Guide to Agentic AI Transition in Organizations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;The Evolution of Agentic AI Software Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/measuring-agent-autonomy" rel="noopener noreferrer"&gt;Measuring AI agent autonomy in practice&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.27296v1" rel="noopener noreferrer"&gt;A Multi-agent AI System for Deep Learning Model Lifecycle&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;Where Do AI Coding Agents Fail? An Empirical Study&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.27249v1" rel="noopener noreferrer"&gt;The Growing Burden of AI-Assisted Software Development&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.18764v2" rel="noopener noreferrer"&gt;The Convergence of Schema-Guided Dialogue Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.28735v1" rel="noopener noreferrer"&gt;RAD-AI: Rethinking Architecture Documentation for AI Systems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of **Primitive Shifts&lt;/em&gt;* — a monthly series tracking when new AI building blocks&lt;br&gt;
move from novel experiments to infrastructure you'll be expected to know.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Next MCP Watch series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Spotted a shift happening in your stack? Drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Weekly Roundup: Microsoft's Price War, Google's Open Model Push, and the Reliability Question Looming Over Enterprise AI</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 06 Apr 2026 12:02:21 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-roundup-microsofts-price-war-googles-open-model-push-and-the-reliability-question-2a49</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-roundup-microsofts-price-war-googles-open-model-push-and-the-reliability-question-2a49</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly Roundup: Microsoft's Price War, Google's Open Model Push, and the Reliability Question Looming Over Enterprise AI
&lt;/h1&gt;

&lt;p&gt;This week marks a pivotal moment in the AI landscape as the two largest cloud providers made aggressive moves to capture developer mindshare—Microsoft through aggressive pricing and Google by releasing its most capable open-weight models yet. But beneath the product announcements lies a more fundamental question gaining traction: can AI systems actually achieve the reliability that justifies the hundreds of billions being wagered on enterprise adoption?&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Unveils MAI Foundry Models to Challenge OpenAI and Google on Price
&lt;/h2&gt;

&lt;p&gt;Microsoft's MAI Superintelligence team, led by Mustafa Suleyman, &lt;a href="https://techcrunch.com/2026/04/02/microsoft-takes-on-ai-rivals-with-three-new-foundational-models/" rel="noopener noreferrer"&gt;released three new foundational models&lt;/a&gt; this week designed to undercut competitors on inference costs while maintaining competitive performance. The models—MAI-1, MAI-2, and MAI-3—span a range of parameter counts optimized for different use cases, from lightweight edge deployment to full-scale reasoning tasks.&lt;/p&gt;

&lt;p&gt;The release represents Microsoft's first major in-house foundation model push since acquiring Inflection AI talent in 2024. Suleyman's team developed the models under what they're calling a "Humanist AI" philosophy, emphasizing practical human-centered communication over raw benchmark performance. In practice, this translates to models that prioritize coherent multi-turn dialogue and task completion over flashy single-shot capabilities.&lt;/p&gt;

&lt;p&gt;The key selling point is price: Microsoft claims MAI-2 delivers comparable performance to GPT-4-class models at roughly 40% lower inference costs, while MAI-3 targets the premium reasoning tier at prices significantly below OpenAI's o1 and Google's Gemini Ultra. The models are available immediately through Microsoft Foundry and will be integrated across Microsoft 365 Copilot, Azure AI services, and GitHub Copilot in coming weeks.&lt;/p&gt;

&lt;p&gt;Whether these cost savings hold up under production workloads remains to be seen, but the pricing pressure on OpenAI—Microsoft's own portfolio company—signals just how fragmented the foundation model market has become.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Releases Gemma 4 as Most Capable Open Model Family
&lt;/h2&gt;

&lt;p&gt;Google DeepMind &lt;a href="https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/" rel="noopener noreferrer"&gt;announced Gemma 4&lt;/a&gt;, its first major update to the Gemma open model family in over a year, shipping four distinct model sizes targeting different deployment scenarios. The lineup includes E2B and E4B efficient variants for edge and mobile applications, a 26B Mixture-of-Experts model for cost-effective inference, and a 31B dense model positioned as the flagship for maximum capability.&lt;/p&gt;

&lt;p&gt;All models ship under the Apache 2.0 license, making them fully permissive for commercial use—a notable contrast to Meta's Llama licensing restrictions. The 31B dense model in particular targets complex logic and agentic workflows, with Google claiming competitive performance against closed-source alternatives on multi-step reasoning benchmarks.&lt;/p&gt;

&lt;p&gt;Perhaps most significant is the 1M token context window available across the model family, enabling document-scale processing without chunking. Early reports from the Hugging Face community, documented in their &lt;a href="https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026" rel="noopener noreferrer"&gt;Spring 2026 State of Open Source report&lt;/a&gt;, suggest the 26B MoE variant delivers particularly strong results on agentic coding tasks while requiring substantially less compute than the dense model.&lt;/p&gt;

&lt;p&gt;The timing positions Gemma 4 as a direct response to both Meta's Llama 4 release and the growing ecosystem of fine-tuned open models. For teams building production AI systems without deep pockets for API costs, this release significantly raises the bar for what's achievable with self-hosted inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The multi-agent orchestration landscape continues to mature, with &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;LangChain, AutoGen, and CrewAI&lt;/a&gt; maintaining their positions as the dominant frameworks for building complex agent systems. However, the past quarter has seen significant movement in the tooling layer as teams push agents from demos into production environments.&lt;/p&gt;

&lt;p&gt;Several &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;new frameworks have emerged&lt;/a&gt; addressing specific pain points: VoltAgent brings a TypeScript-first approach with self-improving context management, while PraisonAI focuses on production multi-agent deployments with native Model Context Protocol (MCP) integration. The MCP standard, now approaching its first anniversary, has become the de facto protocol for tool integration across the ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/blog/RDTvlokip/the-9-trends-that-will-explode-this-year" rel="noopener noreferrer"&gt;Smolagents from Hugging Face&lt;/a&gt; continues gaining traction with its code-first philosophy where agents write and execute Python directly rather than emitting JSON tool calls. This approach trades some safety guarantees for dramatically improved flexibility in complex workflows.&lt;/p&gt;

&lt;p&gt;On the MLOps side, ZenML's integration of "LangGraph swarms" into standard ML pipelines represents the &lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;ongoing convergence&lt;/a&gt; between traditional machine learning infrastructure and agentic systems. The industry is clearly shifting toward formalized inter-agent protocols and supervisor agent patterns, moving away from the ad-hoc agent chains that characterized early implementations.&lt;/p&gt;

&lt;p&gt;Research from this quarter emphasizes that &lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;production-ready agent architectures&lt;/a&gt; require explicit failure handling, state persistence, and human-in-the-loop checkpoints—capabilities that separate serious frameworks from toy implementations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Take-Two Shutters Internal AI Team as Gaming Industry Resets AI Strategy
&lt;/h2&gt;

&lt;p&gt;Luke Dicken, Take-Two Interactive's head of AI, &lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/1" rel="noopener noreferrer"&gt;announced the dissolution&lt;/a&gt; of the company's internal AI team via LinkedIn this week, marking a notable retreat from the publisher's previous AI ambitions. Dicken previously served as senior director of applied AI at Zynga, the mobile gaming giant Take-Two acquired in 2022 for $12.7 billion.&lt;/p&gt;

&lt;p&gt;The move signals a potential industry-wide recalibration in how major gaming companies approach AI development. Rather than maintaining dedicated AI research teams, publishers may be shifting toward licensing external AI services or integrating AI capabilities through middleware providers. This approach trades potential competitive advantages for reduced overhead and access to rapidly evolving third-party capabilities.&lt;/p&gt;

&lt;p&gt;The dissolution contrasts sharply with continued heavy AI investment announcements from other sectors. While enterprise software companies race to embed AI into every product, gaming companies appear more cautious about where AI delivers genuine value versus hype-driven experimentation. NPCs powered by language models, procedural content generation, and AI-assisted development tools have all shown promise in demos but struggle to justify dedicated team costs at current capability levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  German Publisher Sues OpenAI Over ChatGPT's Reproduction of Children's Book Series
&lt;/h2&gt;

&lt;p&gt;A German publisher &lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/2" rel="noopener noreferrer"&gt;filed suit against OpenAI&lt;/a&gt; in Munich this week, alleging that ChatGPT violated copyright by generating content "virtually indistinguishable from the original" Coconut the Dragon children's book series. The case represents one of the most detailed copyright claims yet filed against a major AI provider in European courts.&lt;/p&gt;

&lt;p&gt;According to the filing, ChatGPT not only reproduced narrative text closely matching the original books but also generated cover art, back cover marketing blurbs, and even instructions for self-publishing the generated content—essentially providing a turnkey system for producing unauthorized derivative works. The publisher's legal team documented dozens of prompts that reliably triggered near-verbatim reproduction of copyrighted material.&lt;/p&gt;

&lt;p&gt;The lawsuit tests the boundaries of fair use doctrine under European copyright law, which generally provides narrower exceptions than U.S. law. OpenAI has previously argued that training on copyrighted material constitutes transformative use, but cases involving near-exact reproduction of specific works present a harder legal challenge than claims about training data in general.&lt;/p&gt;

&lt;p&gt;The outcome could have significant implications for how AI companies handle clearly copyrighted creative works in training data and whether post-training mitigations against reproduction are legally sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Musk Reportedly Requiring SpaceX IPO Advisers to Purchase Grok Subscriptions
&lt;/h2&gt;

&lt;p&gt;Banks, law firms, and auditors working on SpaceX's anticipated initial public offering have allegedly been told to purchase subscriptions to Grok, xAI's chatbot product, according to &lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/1" rel="noopener noreferrer"&gt;reporting from The New York Times&lt;/a&gt;. The requirement follows recent corporate restructuring that placed xAI's Grok product technically under SpaceX's corporate umbrella.&lt;/p&gt;

&lt;p&gt;The arrangement raises immediate questions about conflicts of interest in major financial transactions. Advisory firms typically maintain strict independence from clients to preserve the integrity of their guidance, but mandatory product purchases create financial entanglement however small the individual subscription costs.&lt;/p&gt;

&lt;p&gt;Beyond the ethics questions, the move underscores Musk's continued efforts to drive Grok adoption through unconventional channels. The chatbot has struggled to gain market share against ChatGPT and Claude despite significant infrastructure investment in xAI's Memphis supercomputer cluster. Requiring professional services firms to use the product at least ensures some enterprise exposure, even if adoption is coerced rather than organic.&lt;/p&gt;

&lt;p&gt;SpaceX's IPO, if it proceeds, would be one of the largest technology offerings in years, making the advisory relationships particularly high-stakes for all parties involved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Suno Faces Mounting Copyright Concerns as AI Music Generation Enables Streaming Fraud
&lt;/h2&gt;

&lt;p&gt;Suno, the AI music generation platform, faces &lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/2" rel="noopener noreferrer"&gt;growing scrutiny&lt;/a&gt; as its technology enables increasingly sophisticated streaming fraud schemes. The platform's ability to generate music that closely mimics popular artists' styles has made it trivially easy to flood streaming services with AI-generated ripoffs designed to capture listener searches for established acts.&lt;/p&gt;

&lt;p&gt;The problem extends beyond simple copyright infringement. Fraudsters use AI-generated tracks to execute streaming manipulation schemes, uploading thousands of songs that algorithmically target popular search terms and playlist categories. When listeners search for well-known artists or genres, AI-generated content increasingly appears alongside legitimate recordings, siphoning royalty payments from actual creators.&lt;/p&gt;

&lt;p&gt;Streaming platforms have struggled to implement effective detection systems for AI-generated content. Unlike deepfakes of specific recordings, stylistic imitations exist in a legal gray zone—mimicking a musical style isn't clearly illegal, but doing so at industrial scale to deceive consumers raises different concerns.&lt;/p&gt;

&lt;p&gt;The situation highlights regulatory gaps in AI-generated content authentication and growing calls for platform accountability. Some industry groups are pushing for mandatory labeling of AI-generated audio, while others advocate for streaming services to implement more aggressive content moderation. Neither approach has gained sufficient traction to address the problem at its current scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Business Model Reliability Under Scrutiny as Billions Ride on Enterprise Adoption
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://www.reuters.com/technology/does-ai-business-model-have-fatal-flaw-2026-04-01/" rel="noopener noreferrer"&gt;Reuters analysis published this week&lt;/a&gt; poses an uncomfortable question for the AI industry: can these systems actually achieve the reliability needed for high-stakes enterprise work? With hundreds of billions of dollars in investment predicated on the assumption that AI will handle critical business processes, the gap between demo-quality performance and production-grade consistency represents an existential risk for the current investment thesis.&lt;/p&gt;

&lt;p&gt;The analysis highlights a fundamental tension in AI deployment. Language models excel at generating plausible outputs but struggle with the kind of deterministic reliability that enterprise software typically requires. A system that's correct 95% of the time sounds impressive until you consider that a 5% error rate in financial transactions, medical records, or legal documents would be catastrophic.&lt;/p&gt;

&lt;p&gt;Current mitigation strategies—human review, confidence thresholds, restricted use cases—all work but dramatically limit the efficiency gains that justify AI investments. If every AI output requires human verification, the productivity benefits shrink considerably. Enterprise adoption ultimately hinges on solving reliability, not just demonstrating capability on cherry-picked benchmarks.&lt;/p&gt;

&lt;p&gt;The question looms particularly large as AI companies push toward agentic systems that take autonomous actions. A chatbot that occasionally hallucinates is annoying; an agent that occasionally executes the wrong transaction is dangerous.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The next few weeks will likely see competitive responses to both Microsoft's pricing moves and Google's Gemma 4 release, potentially forcing further price cuts across the inference market. The German copyright case bears watching as a potential template for European regulatory approaches to AI training data. And as enterprise reliability concerns gain mainstream attention, expect increased focus on evaluation frameworks and production monitoring tools designed to quantify—and hopefully improve—real-world AI system dependability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/04/02/microsoft-takes-on-ai-rivals-with-three-new-foundational-models/" rel="noopener noreferrer"&gt;Microsoft takes on AI rivals with three new foundational models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/" rel="noopener noreferrer"&gt;Gemma 4: Our most capable open models to date&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026" rel="noopener noreferrer"&gt;State of Open Source on Hugging Face: Spring 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;caramaschiHG/awesome-ai-agents-2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/RDTvlokip/the-9-trends-that-will-explode-this-year" rel="noopener noreferrer"&gt;🎆 AI 2026 — The 9 trends that will EXPLODE this year!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.10479" rel="noopener noreferrer"&gt;The Evolution of Agentic AI Software Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;The Path Ahead for Agentic AI: Challenges and Opportunities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/1" rel="noopener noreferrer"&gt;Ai Artificial Intelligence Archive for April 2026 - Page 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theverge.com/archives/ai-artificial-intelligence/2026/4/2" rel="noopener noreferrer"&gt;Ai Artificial Intelligence Archive for April 2026 - Page 2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.reuters.com/technology/does-ai-business-model-have-fatal-flaw-2026-04-01/" rel="noopener noreferrer"&gt;Does the AI business model have a fatal flaw?&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>AI Weekly Digest: OpenAI Pivots Away from Sora, Agentic Frameworks Mature, and Open-Weight Models Gain Ground</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:18:01 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-digest-openai-pivots-away-from-sora-agentic-frameworks-mature-and-open-weight-models-5ahh</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-digest-openai-pivots-away-from-sora-agentic-frameworks-mature-and-open-weight-models-5ahh</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly Digest: OpenAI Pivots Away from Sora, Agentic Frameworks Mature, and Open-Weight Models Gain Ground
&lt;/h1&gt;

&lt;p&gt;This week marked a strategic inflection point as OpenAI made the surprising decision to sunset its standalone Sora video application, signaling that even the most hyped consumer AI products face ruthless prioritization when compute resources run thin. Meanwhile, the agentic programming landscape continued its rapid maturation, with LangGraph cementing its position as the production-grade framework of choice while enterprises scramble to measure actual ROI from their AI agent deployments. The tension between building impressive demos and delivering reliable, cost-effective systems has never been more apparent.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Discontinues Sora Video App to Refocus on Core AI Infrastructure
&lt;/h2&gt;

&lt;p&gt;In a move that caught many observers off guard, OpenAI has &lt;a href="https://coaio.com/news/2026/03/breaking-tech-news-on-march-28-2026-ai-revolution-hardware-upgrades-2koc/" rel="noopener noreferrer"&gt;discontinued its standalone Sora video generation application&lt;/a&gt; despite the tool's initial popularity following its public launch. The company cited the substantial compute demands of high-fidelity video generation as the primary driver behind the decision, with internal metrics reportedly showing declining active usage after the initial novelty wore off.&lt;/p&gt;

&lt;p&gt;The resources previously dedicated to Sora consumer operations are being &lt;a href="https://www.marketingprofs.com/opinions/2026/54473/ai-update-march-27-2026-ai-news-and-views-from-the-past-week/" rel="noopener noreferrer"&gt;redirected toward foundational model scaling, infrastructure improvements, and enterprise products&lt;/a&gt;—areas where OpenAI sees clearer paths to sustainable revenue. However, the company emphasized that video and world simulation research continues internally, with applications targeted toward future robotics systems rather than consumer content creation.&lt;/p&gt;

&lt;p&gt;This pivot reflects a broader industry reckoning with the economics of generative media. Video models require orders of magnitude more compute than text or image generation, and monetization has proven challenging when users expect free or near-free access. For developers who built workflows around Sora's API, the shutdown underscores the platform risk inherent in depending on consumer-facing AI services from companies still searching for product-market fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Revenium Launches AI Outcomes for Workflow-Level ROI Measurement
&lt;/h2&gt;

&lt;p&gt;As enterprises push deeper into agentic AI deployments, the question of actual return on investment has become increasingly urgent. Revenium's newly launched &lt;a href="https://www.crescendo.ai/news/latest-ai-news-and-updates" rel="noopener noreferrer"&gt;AI Outcomes tool addresses this challenge by measuring ROI at the granular workflow level&lt;/a&gt; for individual AI agent executions—a capability that's been notably absent from the market.&lt;/p&gt;

&lt;p&gt;The tool provides detailed cost attribution across multi-step agent workflows, tracking not just API spend but also infrastructure costs, human oversight time, and downstream business impact. For organizations running hundreds of distinct agent configurations, this visibility helps identify which deployments are generating value versus which are burning compute cycles without meaningful output.&lt;/p&gt;

&lt;p&gt;The timing is particularly relevant as companies move from experimental pilots to production-scale agent deployments. &lt;a href="https://www.ibm.com/think/news/ai-tech-trends-predictions-2026" rel="noopener noreferrer"&gt;Enterprises are increasingly focused on demonstrating concrete business value&lt;/a&gt; from their AI investments, and CFOs want more than anecdotal productivity gains. Revenium's approach of measuring outcomes at the workflow rather than model level acknowledges that AI value creation happens in the orchestration—how agents are chained, what tools they access, and how errors are handled—not just in raw model capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lucid Software Expands MCP Server with New Process Agent for Team Collaboration
&lt;/h2&gt;

&lt;p&gt;Lucid Software has &lt;a href="https://www.marketingprofs.com/opinions/2026/54473/ai-update-march-27-2026-ai-news-and-views-from-the-past-week/" rel="noopener noreferrer"&gt;rolled out significant enhancements to its Model Context Protocol server integration&lt;/a&gt; alongside a new Process Agent designed to streamline complex diagram and documentation workflows. The updates extend Lucid's existing AI capabilities with tighter integration into the MCP ecosystem that's become the de facto standard for tool connectivity in agentic systems.&lt;/p&gt;

&lt;p&gt;The Process Agent specifically targets the labor-intensive work of translating meeting notes, requirements documents, and scattered communications into structured visual artifacts like flowcharts, architecture diagrams, and project timelines. Early users report substantial time savings on the kind of documentation work that often falls through the cracks in fast-moving engineering teams.&lt;/p&gt;

&lt;p&gt;This release reflects a &lt;a href="https://news.microsoft.com/source/features/ai/whats-next-in-ai-7-trends-to-watch-in-2026/" rel="noopener noreferrer"&gt;broader trend toward AI-assisted collaborative tooling&lt;/a&gt; where AI agents operate as persistent team members rather than on-demand utilities. By integrating through MCP, Lucid's agents can be invoked from other AI tools and workflows, fitting into the increasingly complex orchestration layers that enterprises are building. The approach suggests that specialized domain agents—rather than general-purpose assistants—may be the path to production-ready AI augmentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code Tool Suffers from "Amnesia" Bug in Extended Sessions
&lt;/h2&gt;

&lt;p&gt;Reports have surfaced of a frustrating bug in Anthropic's Claude Code tool where the assistant &lt;a href="https://blog.logrocket.com/ai-dev-tool-power-rankings/" rel="noopener noreferrer"&gt;forgets previous instructions during extended coding sessions&lt;/a&gt;, forcing users to repeatedly re-explain project context and coding conventions. The issue appears most pronounced in sessions exceeding several hours, with the AI seeming to "reset" its understanding of the codebase and prior decisions.&lt;/p&gt;

&lt;p&gt;Developers describe scenarios where Claude Code begins suggesting changes that directly contradict earlier agreed-upon patterns, or requests information that was provided earlier in the same session. The bug leads to significant productivity losses on complex, multi-day refactoring efforts where maintaining continuity is essential.&lt;/p&gt;

&lt;p&gt;The "amnesia" problem highlights ongoing &lt;a href="https://www.builder.io/blog/best-ai-tools-2026" rel="noopener noreferrer"&gt;reliability challenges in AI coding assistants&lt;/a&gt; when applied to substantial projects. While these tools excel at bounded tasks—writing a single function, explaining a code snippet, or generating boilerplate—maintaining coherent understanding across extended workflows remains technically difficult. The issue raises fundamental questions about memory management in agentic systems: how should context be persisted, summarized, and retrieved to enable the kind of sustained collaboration that mirrors human pair programming?&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The agentic programming landscape has crystallized significantly this quarter, with &lt;a href="https://gurusup.com/blog/best-multi-agent-frameworks-2026" rel="noopener noreferrer"&gt;LangGraph emerging as the framework of choice&lt;/a&gt; for teams building production systems that require iterative, self-correcting reasoning loops. Its cyclic graph architecture enables the kind of retry logic and conditional branching that real-world agent deployments demand—agents can reconsider decisions, request human approval at key junctures, and gracefully handle the failures that inevitably occur when AI interfaces with external systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pub.towardsai.net/the-4-best-open-source-multi-agent-ai-frameworks-2026-9da389f9407a" rel="noopener noreferrer"&gt;CrewAI continues gaining traction&lt;/a&gt; for rapid multi-agent prototyping, particularly among teams exploring agent concepts before committing to production architecture. However, we're seeing a consistent migration pattern: teams that start with CrewAI's simpler abstractions often move to LangGraph when they need production-grade state management, persistence, and observability. The tradeoff between developer velocity and operational robustness remains a defining tension in framework selection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.instaclustr.com/education/agentic-ai/agentic-ai-frameworks-top-10-options-in-2026/" rel="noopener noreferrer"&gt;AutoGen has continued its evolution past v0.4&lt;/a&gt;, now offering a layered architecture with the event-driven Core runtime, Studio for no-code prototyping, and AgentChat as the primary Python API surface. This stratification aims to serve both researchers exploring novel agent architectures and enterprise teams deploying standardized patterns. Meanwhile, &lt;a href="https://vellum.ai/blog/top-ai-agent-frameworks-for-developers" rel="noopener noreferrer"&gt;Google's Agent Development Kit (ADK) and Claude's Agent SDK&lt;/a&gt; are expanding enterprise multi-agent options, though adoption remains concentrated among customers already invested in those respective ecosystems.&lt;/p&gt;

&lt;p&gt;The emerging &lt;a href="https://www.exabeam.com/explainers/agentic-ai/agentic-ai-frameworks-key-components-top-8-options/" rel="noopener noreferrer"&gt;industry consensus is clear: orchestration architecture now matters more than individual agent intelligence&lt;/a&gt;. Teams are finding that how agents communicate, maintain state, handle errors, and coordinate tool usage often determines system success more than the underlying model's raw capabilities. This realization is driving increased investment in observability tools, agent tracing systems, and standardized evaluation frameworks for multi-agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini 3.1 Flash Live Rolls Out Globally with Real-Time Audio and Camera Integration
&lt;/h2&gt;

&lt;p&gt;Google has &lt;a href="https://www.crescendo.ai/news/latest-ai-news-and-updates" rel="noopener noreferrer"&gt;expanded its Gemini 3.1 Flash Live capabilities globally&lt;/a&gt;, bringing real-time audio conversations and visual context integration to users worldwide. The system allows continuous voice conversations with the AI while optionally sharing phone camera input, enabling scenarios where users can ask questions about what they're seeing in real-time.&lt;/p&gt;

&lt;p&gt;The technical implementation represents a significant advancement in multimodal streaming—the system processes audio and visual inputs simultaneously while maintaining conversational context and generating low-latency responses. Early demonstrations show the system identifying objects, reading text, providing navigation assistance, and answering questions about physical environments with reasonable accuracy.&lt;/p&gt;

&lt;p&gt;The global rollout has intensified discussions about &lt;a href="https://www.marketingprofs.com/opinions/2026/54473/ai-update-march-27-2026-ai-news-and-views-from-the-past-week/" rel="noopener noreferrer"&gt;distinguishing human from machine interactions&lt;/a&gt; as ambient AI assistants become more prevalent. The naturalness of real-time audio conversations—without the perceptible pauses of earlier systems—raises questions about disclosure and authenticity in communication. More broadly, the release signals a shift toward AI that observes and participates in the physical world rather than being confined to text boxes and chat interfaces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Secures Legal Victory Against Trump Administration Restrictions
&lt;/h2&gt;

&lt;p&gt;Anthropic has &lt;a href="https://www.crescendo.ai/news/latest-ai-news-and-updates" rel="noopener noreferrer"&gt;won an injunction against federal restrictions&lt;/a&gt; that had threatened to constrain its AI operations, providing the company with crucial operational clarity amid ongoing regulatory uncertainty. The legal victory addresses restrictions that would have impacted aspects of the company's enterprise AI services, though specific details of the contested regulations remain partially sealed.&lt;/p&gt;

&lt;p&gt;The ruling comes as Anthropic &lt;a href="https://www.ibm.com/think/news/ai-tech-trends-predictions-2026" rel="noopener noreferrer"&gt;continues an aggressive enterprise push&lt;/a&gt;, expanding its Claude model family and enterprise API offerings. The company has positioned itself as a safety-focused alternative to OpenAI and Google, and any operational restrictions would have significantly impacted that competitive positioning.&lt;/p&gt;

&lt;p&gt;The case signals ongoing tension between AI companies and government oversight efforts. While regulatory frameworks for AI remain unsettled, companies are increasingly turning to litigation to challenge restrictions they view as overreaching or technically uninformed. For enterprise customers evaluating AI partnerships, the regulatory environment has become a meaningful factor in vendor selection—platform stability depends not just on technical excellence but also on a company's ability to navigate the political landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kimi K2.5 and Qwen 3 Coder Join Growing Open-Weight Model Ecosystem
&lt;/h2&gt;

&lt;p&gt;The open-weight model ecosystem continues expanding, with &lt;a href="https://llm-stats.com/ai-news" rel="noopener noreferrer"&gt;Kimi K2.5 and Qwen 3 Coder&lt;/a&gt; representing the latest entries that provide developers with alternatives to closed API dependencies. These releases reflect an accelerating trend toward &lt;a href="https://whatllm.org/blog/llm-releases-march-2026" rel="noopener noreferrer"&gt;efficient mixture-of-experts architectures&lt;/a&gt; and edge-capable reasoning models that can run on more modest infrastructure.&lt;/p&gt;

&lt;p&gt;Kimi K2.5 offers a partially open-source approach with self-hosting capabilities, targeting developers who need model control without full weight access. Meanwhile, &lt;a href="https://explodingtopics.com/blog/list-of-llms" rel="noopener noreferrer"&gt;Qwen 3 Coder provides full Apache 2.0 licensing&lt;/a&gt;, enabling enterprise deployment without licensing friction—an increasingly important consideration as legal teams scrutinize AI model dependencies.&lt;/p&gt;

&lt;p&gt;The practical impact is meaningful: organizations can now &lt;a href="https://www.reddit.com/r/AISEOInsider/comments/1qg3m5k/the_2026_open_source_ai_stack_thats_beating_paid/" rel="noopener noreferrer"&gt;build AI stacks that rival or exceed paid tools&lt;/a&gt; in specific domains while maintaining full control over deployment, fine-tuning, and data handling. The &lt;a href="https://builder.aws.com/content/38sWXfm1ewXHg9pdCLmHo3XWIQX/top-5-open-source-ai-model-api-providers-in-2026" rel="noopener noreferrer"&gt;expansion of open-source AI model API providers&lt;/a&gt; means teams can leverage these models without managing inference infrastructure, getting the benefits of open weights with managed deployment convenience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The OpenAI Sora shutdown and Anthropic's legal battle both point to an industry maturing past pure capability growth into harder questions of sustainability, economics, and governance. Watch for more companies to make similarly tough prioritization decisions as compute economics force focus. The agentic framework space is also approaching a consolidation phase—expect clearer winners and losers by midyear as production deployments reveal which architectures actually scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://coaio.com/news/2026/03/breaking-tech-news-on-march-28-2026-ai-revolution-hardware-upgrades-2koc/" rel="noopener noreferrer"&gt;Breaking Tech News on March 28, 2026: AI Revolution, Hardware ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.crescendo.ai/news/latest-ai-news-and-updates" rel="noopener noreferrer"&gt;Latest AI News and AI Breakthroughs that Matter Most: 2026 &amp;amp; 2025&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marketingprofs.com/opinions/2026/54473/ai-update-march-27-2026-ai-news-and-views-from-the-past-week" rel="noopener noreferrer"&gt;AI Update, March 27, 2026: AI News and Views From the Past Week&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ibm.com/think/news/ai-tech-trends-predictions-2026" rel="noopener noreferrer"&gt;The trends that will shape AI and tech in 2026 - IBM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.microsoft.com/source/features/ai/whats-next-in-ai-7-trends-to-watch-in-2026/" rel="noopener noreferrer"&gt;What's next in AI: 7 trends to watch in 2026 - Microsoft Source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.instaclustr.com/education/agentic-ai/agentic-ai-frameworks-top-10-options-in-2026/" rel="noopener noreferrer"&gt;Agentic AI Frameworks: Top 10 Options in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.exabeam.com/explainers/agentic-ai/agentic-ai-frameworks-key-components-top-8-options/" rel="noopener noreferrer"&gt;Agentic AI Frameworks: Key Components &amp;amp; Top 8 Options in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pub.towardsai.net/the-4-best-open-source-multi-agent-ai-frameworks-2026-9da389f9407a" rel="noopener noreferrer"&gt;The 4 Best Open Source Multi-Agent AI Frameworks 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gurusup.com/blog/best-multi-agent-frameworks-2026" rel="noopener noreferrer"&gt;Best Multi-Agent Frameworks in 2026: LangGraph, CrewAI, OpenAI ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vellum.ai/blog/top-ai-agent-frameworks-for-developers" rel="noopener noreferrer"&gt;The Top 11 AI Agent Frameworks For Developers In September 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://llm-stats.com/ai-news" rel="noopener noreferrer"&gt;LLM News Today (March 2026) – AI Model Releases - LLM Stats&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://whatllm.org/blog/llm-releases-march-2026" rel="noopener noreferrer"&gt;New LLMs March 2026: GPT-5.4 Tied for #1. Nobody Talked About It.&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://explodingtopics.com/blog/list-of-llms" rel="noopener noreferrer"&gt;Top 50+ Large Language Models (LLMs) in 2026 - Exploding Topics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://builder.aws.com/content/38sWXfm1ewXHg9pdCLmHo3XWIQX/top-5-open-source-ai-model-api-providers-in-2026" rel="noopener noreferrer"&gt;Top 5 Open-Source AI Model API Providers in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reddit.com/r/AISEOInsider/comments/1qg3m5k/the_2026_open_source_ai_stack_thats_beating_paid/" rel="noopener noreferrer"&gt;The 2026 Open Source AI Stack That's Beating Paid Tools - Reddit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.logrocket.com/ai-dev-tool-power-rankings/" rel="noopener noreferrer"&gt;AI dev tool power rankings &amp;amp; comparison [March 2026]&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.builder.io/blog/best-ai-tools-2026" rel="noopener noreferrer"&gt;Best AI Coding Tools for Developers in 2026 - Builder.io&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every day, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>AI Weekly: Gemini 3.1 Pro Leads a Week Where Open Source Closes In</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 30 Mar 2026 12:01:51 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-gemini-31-pro-leads-a-week-where-open-source-closes-in-2bj7</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-gemini-31-pro-leads-a-week-where-open-source-closes-in-2bj7</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly: Gemini 3.1 Pro Leads a Week Where Open Source Closes In
&lt;/h1&gt;

&lt;p&gt;The gap between frontier and open-source models is shrinking faster than anyone predicted. This week, Google DeepMind dropped Gemini 3.1 Pro with benchmark numbers that would have seemed impossible two years ago, but the real story is what's happening in the open-weights space—MiMo-V2-Flash and DeepSeek V3.2 are now within striking distance of proprietary systems at a fraction of the cost. Meanwhile, the infrastructure for agentic AI is maturing rapidly, robotics funding is surging, and Wikipedia is drawing a hard line against AI-generated content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini 3.1 Pro Arrives with 1M-Token Context and 77% ARC-AGI-2 Score
&lt;/h2&gt;

&lt;p&gt;Google DeepMind has released Gemini 3.1 Pro, positioning it as the most capable Pro-tier model in their lineup. The headline numbers are impressive: a 1 million token context window and a 77.1% score on the ARC-AGI-2 benchmark, which specifically tests abstract reasoning capabilities that have historically challenged language models.&lt;/p&gt;

&lt;p&gt;The multimodal reasoning extends across text, images, audio, video, and code in a unified architecture—no separate models stitched together. In practice, this means developers can pass in a two-hour video alongside a codebase and ask questions that require understanding both. Google claims latency improvements over 2.0 Pro despite the expanded capabilities, though independent benchmarks are still pending.&lt;/p&gt;

&lt;p&gt;Availability is broad: the model is accessible through the Gemini API, Vertex AI for enterprise deployments, and Google's Antigravity platform. Pricing sits at the expected Pro tier, making it competitive with Claude 4.1 Sonnet and GPT-5.1 for most production use cases.&lt;/p&gt;

&lt;p&gt;The ARC-AGI-2 score deserves attention. This benchmark specifically targets the kind of novel pattern recognition that pure next-token prediction struggles with—think IQ test problems rather than memorized facts. Breaking 77% suggests meaningful progress on generalization, though the gap to human performance (mid-80s for average adults) remains real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open-Source Models Close the Gap: MiMo-V2-Flash and DeepSeek V3.2 Challenge Frontier Systems
&lt;/h2&gt;

&lt;p&gt;The open-source revolution in LLMs just hit an inflection point. MiMo-V2-Flash from Xiaomi's AI lab achieved a 66 QI (Quality Index) score with a stunning 96% on AIME—that's the American Invitational Mathematics Examination, not some synthetic benchmark. This represents the best mathematical reasoning performance ever recorded in an open-weights model.&lt;/p&gt;

&lt;p&gt;DeepSeek V3.2, released the same week, matches that 66 QI while offering inference at $0.30 per million tokens through deepinfra. For comparison, GPT-5.1 runs $3.50/M tokens—more than 10x the cost for roughly equivalent capability on most tasks.&lt;/p&gt;

&lt;p&gt;A detailed Reddit analysis examining 94 LLM API endpoints found the proprietary advantage has compressed to approximately 4 QI points. Two years ago, that gap was 15-20 points. The practical implication: for most production workloads that don't require the absolute bleeding edge, open-source models now offer better cost-performance ratios.&lt;/p&gt;

&lt;p&gt;This isn't just academic. Companies running inference at scale are doing the math: a 10x cost reduction with a &amp;lt;5% capability hit changes build-versus-buy calculations dramatically. We're seeing migration patterns accelerate, particularly for classification, summarization, and code generation tasks where the gap is smallest. The remaining proprietary advantages cluster around complex multi-step reasoning and highly specialized domains—exactly the capabilities that justify premium pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;Anthropic's 2026 Agentic Coding Trends Report, published last Tuesday, makes a bold prediction: multi-agent systems will largely replace single-agent workflows for complex coding tasks by year's end. The report cites internal data showing 3-4x throughput improvements when decomposing tasks across specialized agents versus monolithic prompting. This aligns with what we're seeing in production deployments—the single-agent pattern is hitting scaling walls.&lt;/p&gt;

&lt;p&gt;Microsoft's Agent Framework, now in public preview, takes an explicitly enterprise-first approach. The emphasis is on durable orchestration (agents that survive process restarts) and human-in-the-loop scenarios where approval gates are non-negotiable. This matters for regulated industries where "the AI just did it" isn't an acceptable answer to auditors.&lt;/p&gt;

&lt;p&gt;AutoGen v0.4+ represents a significant architectural pivot to event-driven execution with full async support. The previous version's synchronous patterns created bottlenecks in large-scale multi-agent coordination; the new architecture allows hundreds of agents to operate concurrently without blocking. Migration guides are available, but expect some friction—the programming model changed substantially.&lt;/p&gt;

&lt;p&gt;The dominant architectural pattern emerging across all frameworks is &lt;strong&gt;role archetypes&lt;/strong&gt;: Planner, Researcher, Coder, Reviewer. These constrained personas improve explainability (you can trace which "role" made which decision) and reduce the unbounded exploration that makes agents unreliable. Framework selection now increasingly hinges on state management philosophy—durable execution platforms like Temporal versus stateless serverless approaches determine whether your agents survive infrastructure failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wikipedia Cracks Down on AI-Generated Articles Amid Misinformation Concerns
&lt;/h2&gt;

&lt;p&gt;Wikipedia editors have implemented significantly stricter detection and removal protocols for AI-generated content, marking an escalation in the platform's ongoing struggle with synthetic text. The trigger: a wave of articles containing hallucinated citations—papers that don't exist, quotes that were never said, statistics fabricated wholesale.&lt;/p&gt;

&lt;p&gt;The detection methods combine automated classifiers with human review, focusing on synthetic writing patterns (the telltale hedging phrases, the suspiciously comprehensive coverage, the lack of idiosyncratic human perspective) and citation verification. Editors report finding articles where every single source was either non-existent or misrepresented.&lt;/p&gt;

&lt;p&gt;This raises uncomfortable questions for the AI ecosystem. Wikipedia has been a cornerstone training data source for language models; if AI-generated content infiltrates Wikipedia at scale, we get a feedback loop where models train on their own hallucinations. The platform is essentially defending the integrity of one of the most important knowledge repositories on the internet.&lt;/p&gt;

&lt;p&gt;The response is part of a broader platform governance trend. Stack Overflow's AI content restrictions, academic publishers requiring AI disclosure, and social media platforms labeling synthetic content all reflect the same tension: AI-generated text is now good enough to pass casual inspection but not reliable enough to trust without verification. Wikipedia's hard line may influence how other knowledge platforms approach the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  ByteDance Launches Dreamina Seedance 2.0 with Built-In Misuse Protections
&lt;/h2&gt;

&lt;p&gt;ByteDance entered the AI video generation race this week with Dreamina Seedance 2.0, notable less for its generation quality (competitive with Runway Gen-3 and Pika 2.0) than for its deployment model. The tool is integrated directly into CapCut, ByteDance's video editing platform with over 200 million monthly users.&lt;/p&gt;

&lt;p&gt;The built-in safeguards are the more interesting story. Seedance 2.0 includes detection systems that refuse to generate content matching known individuals without explicit consent mechanisms, won't produce photorealistic violence or explicit content, and watermarks all outputs with invisible fingerprints. These aren't post-hoc additions—they're architectural decisions baked into the model.&lt;/p&gt;

&lt;p&gt;This "responsible-by-design" approach contrasts with the "release-then-patch" pattern we've seen from other players. ByteDance clearly learned from the deepfake controversies that plagued earlier tools; TikTok's content moderation challenges presumably informed the decision to build guardrails before launch rather than after the damage is done.&lt;/p&gt;

&lt;p&gt;For developers, CapCut integration means API access is coming—ByteDance typically follows consumer launches with developer tools within 3-6 months. The misuse protections will likely carry over, meaning anyone building on this platform inherits the guardrails. Whether that's a feature or a limitation depends on your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohere Ships Open-Source Voice Model Supporting 14 Languages on Consumer GPUs
&lt;/h2&gt;

&lt;p&gt;Cohere released an open-source speech transcription model this week designed to run on consumer-grade hardware—think RTX 4070-class GPUs, not datacenter A100s. The model supports 14 languages out of the box: English, Spanish, Mandarin, Hindi, Arabic, Portuguese, French, German, Japanese, Korean, Russian, Italian, Dutch, and Turkish.&lt;/p&gt;

&lt;p&gt;The performance numbers are solid if not groundbreaking: word error rates within 10% of Whisper large-v3 while running 3x faster on equivalent hardware. The real value proposition is architectural—this is a single model handling all 14 languages, not 14 separate models, which simplifies deployment significantly.&lt;/p&gt;

&lt;p&gt;For developers building voice-enabled applications, the implications are meaningful. No API dependencies means no per-minute costs, no network latency, and no privacy concerns about audio leaving local infrastructure. A small business building a voice assistant can now do so without ongoing API costs that scale with usage.&lt;/p&gt;

&lt;p&gt;This continues the edge deployment trend we've tracked throughout 2025-2026: capable models are migrating from cloud-only to local-first. Voice is particularly suited to this pattern because audio data is sensitive and latency-critical. Cohere's licensing (Apache 2.0) removes commercial use restrictions, making this viable for production deployments without negotiating enterprise agreements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Physical Intelligence Eyes $1B Raise as Robotics AI Funding Accelerates
&lt;/h2&gt;

&lt;p&gt;Physical Intelligence, the robotics AI startup founded by ex-Google and Berkeley researchers, is reportedly in discussions for a $1 billion funding round that would approximately double its valuation to $4.5 billion. The company's π0 foundation model for robot control demonstrated cross-embodiment generalization last year—the same weights controlling arms, hands, and mobile bases.&lt;/p&gt;

&lt;p&gt;The timing aligns with a broader thesis gaining momentum: 2026 is the year physical AI and robotics become the new scaling frontier. IBM Research's January predictions explicitly called this out, arguing that pure LLM scaling is hitting diminishing returns and that embodied intelligence represents the next capability unlock.&lt;/p&gt;

&lt;p&gt;The capital deployment in this space is accelerating. SoftBank's $40 billion loan announced last week—primarily targeting AI infrastructure—signals that major investors see robotics and physical AI as requiring the same massive capital intensity that drove the LLM buildout. Figure, Sanctuary AI, and 1X are all reportedly raising substantial rounds.&lt;/p&gt;

&lt;p&gt;The bulls argue that language models have essentially solved perception and reasoning; applying those capabilities to physical tasks is the obvious next step. The bears counter that sim-to-real transfer, hardware reliability, and safety certification create multi-year deployment timelines that investor patience may not accommodate. Physical Intelligence's ability to close this round will signal which narrative the market believes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Platoon Integrates AI into Veteran Coding Bootcamp Curriculum
&lt;/h2&gt;

&lt;p&gt;Code Platoon, the nonprofit bootcamp focused on transitioning military veterans and their spouses into tech careers, unveiled a substantially modernized curriculum this week. The updated program combines full-stack engineering fundamentals with generative AI skills, reflecting what the organization calls "the new baseline for engineering competency."&lt;/p&gt;

&lt;p&gt;The AI integration goes beyond "how to use ChatGPT." Students learn ML fundamentals (enough to understand what's happening inside the models), prompt engineering for production systems, RAG architecture for building knowledge-augmented applications, and evaluation frameworks for AI-assisted code. The capstone projects require building applications with generative AI components.&lt;/p&gt;

&lt;p&gt;This matters because workforce training programs are leading indicators. When bootcamps—which optimize aggressively for job placement—embed AI throughout their curricula rather than treating it as an elective, it signals that employers now expect these skills by default. Code Platoon's placement partners reportedly requested the curriculum changes, wanting graduates who can build with AI tooling from day one.&lt;/p&gt;

&lt;p&gt;The veteran angle adds another dimension: this population often brings domain expertise (logistics, cybersecurity, operations) that translates well to building AI applications for those verticals. Combining that background with modern engineering skills creates a talent pipeline that enterprise employers are actively seeking.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Next week brings the Anthropic developer conference where Claude 4.2 is expected alongside expanded MCP tooling. The open-source momentum shows no signs of slowing—Llama 4 is rumored for Q2, which could compress the proprietary gap further. Most immediately, keep an eye on how quickly enterprise adoption shifts as the cost-capability curves cross; the migration patterns we're seeing in Q1 data suggest 2026 may be the year open-source becomes the default choice for most production LLM workloads.
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every day, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
  </channel>
</rss>
