<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Richard Dillon</title>
    <description>The latest articles on DEV Community by Richard Dillon (@richard_dillon_b9c238186e).</description>
    <link>https://dev.to/richard_dillon_b9c238186e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3849330%2F15d5a6d5-ef2a-430b-9760-3ac77ede5242.png</url>
      <title>DEV Community: Richard Dillon</title>
      <link>https://dev.to/richard_dillon_b9c238186e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/richard_dillon_b9c238186e"/>
    <language>en</language>
    <item>
      <title>LangGraph Fault Tolerance: Building Resilient Agents with Retries, Timeouts, and Error Handlers</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 15 Jun 2026 12:03:38 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/langgraph-fault-tolerance-building-resilient-agents-with-retries-timeouts-and-error-handlers-29pa</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/langgraph-fault-tolerance-building-resilient-agents-with-retries-timeouts-and-error-handlers-29pa</guid>
      <description>&lt;h1&gt;
  
  
  LangGraph Fault Tolerance: Building Resilient Agents with Retries, Timeouts, and Error Handlers
&lt;/h1&gt;

&lt;p&gt;Your agent completed 90% of a complex research task, made fourteen successful API calls, and then hit a transient rate limit on the fifteenth. Now it's dead. Checkpoints won't save you here—they tell you &lt;em&gt;where&lt;/em&gt; the agent stopped, not &lt;em&gt;how&lt;/em&gt; to recover gracefully. This gap between state persistence and active recovery has been the single largest source of operational burden for teams running production agents, and LangGraph's new fault tolerance primitives finally close it.&lt;/p&gt;

&lt;p&gt;The timing matters. As organizations move from proof-of-concept agents to production deployments handling thousands of daily invocations, the economics of manual intervention become untenable. A support agent that requires human restarts 15% of the time isn't a productivity gain—it's a liability. The new &lt;code&gt;@retry&lt;/code&gt; decorator, &lt;code&gt;TimeoutPolicy&lt;/code&gt; class, and &lt;code&gt;ErrorHandler&lt;/code&gt; nodes represent LangGraph's first comprehensive answer to this challenge, building on the framework's existing &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;resilient agent architecture&lt;/a&gt; while addressing the operational realities of 2026's agentic workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Why Checkpointing Alone Isn't Enough
&lt;/h2&gt;

&lt;p&gt;LangGraph's checkpointing system—whether you're using &lt;code&gt;PostgresSaver&lt;/code&gt;, &lt;code&gt;MemorySaver&lt;/code&gt;, or the newer distributed options—excels at one job: capturing the complete state of an agent at defined points in execution. When an agent crashes, you can inspect exactly what happened and resume from that state. This is table stakes for any serious agentic system, and LangGraph has done it well.&lt;/p&gt;

&lt;p&gt;But checkpointing is fundamentally passive. It answers "where did we stop?" without answering "should we try again?" or "how long should we wait?" or "what's our fallback if this keeps failing?"&lt;/p&gt;

&lt;p&gt;Consider the failure modes that dominate production agent deployments. Rate limits from tool APIs are the most common—OpenAI, Anthropic, and every third-party data provider impose them, and they're designed to be transient. A 429 response at 2:15 PM will likely succeed at 2:16 PM. Transient 5xx errors from external services follow similar patterns. LLM provider timeouts spike during high-traffic periods; if your agent runs during peak hours, you'll see these regularly. Network partitions between your agent and external services happen more often than anyone wants to admit.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://www.langchain.com/blog/langgraph-multi-agent-workflows" rel="noopener noreferrer"&gt;multi-agent workflows&lt;/a&gt; and the newer Deep Agents architecture, you face an additional challenge: sub-agent hangs. A planning agent delegates to a research sub-agent, which gets stuck waiting for a response that will never come. Without timeouts, your entire workflow freezes.&lt;/p&gt;

&lt;p&gt;The real cost isn't technical—it's operational. Every manual restart requires human attention, context switching, and decision-making. Teams running customer-facing agents report that before adopting fault tolerance patterns, they spent significant portions of their on-call rotations simply restarting agents that hit transient failures. The &lt;a href="https://www.langchain.com/blog/the-agent-development-lifecycle" rel="noopener noreferrer"&gt;agent development lifecycle&lt;/a&gt; extends well beyond deployment, and monitoring becomes firefighting without proper recovery mechanisms.&lt;/p&gt;

&lt;p&gt;The conceptual gap is clear: checkpointing defines &lt;em&gt;where&lt;/em&gt; to resume, while fault tolerance defines &lt;em&gt;whether and how&lt;/em&gt; to retry before giving up. You need both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core API: The &lt;code&gt;@retry&lt;/code&gt; Decorator
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;@retry&lt;/code&gt; decorator brings production-grade retry logic to node functions without the boilerplate that previously cluttered every external API call. The basic signature is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exponential&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retryable_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_external_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The configuration options address the full spectrum of retry scenarios. &lt;code&gt;max_attempts&lt;/code&gt; is an integer that includes the initial attempt—so &lt;code&gt;max_attempts=3&lt;/code&gt; means one initial try plus two retries. The &lt;code&gt;backoff&lt;/code&gt; parameter accepts &lt;code&gt;"constant"&lt;/code&gt;, &lt;code&gt;"linear"&lt;/code&gt;, or &lt;code&gt;"exponential"&lt;/code&gt; strategies, each with configurable &lt;code&gt;base_delay&lt;/code&gt; (default 1.0 seconds) and &lt;code&gt;max_delay&lt;/code&gt; (default 60 seconds) parameters. Exponential backoff with jitter is the recommended default for API rate limits.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;retryable_exceptions&lt;/code&gt; parameter is crucial for correct behavior. Only exceptions in this list trigger retries; all others propagate immediately. This prevents retrying on errors that won't resolve with time—a malformed request will fail identically on every attempt. For more complex scenarios, &lt;code&gt;retry_condition&lt;/code&gt; accepts a callable &lt;code&gt;(exception, attempt) -&amp;gt; bool&lt;/code&gt; that enables custom logic: "retry rate limits for the first 5 attempts, but only retry timeouts twice."&lt;/p&gt;

&lt;p&gt;Integration with LangGraph's state management is seamless and, importantly, safe. Retries operate on the &lt;em&gt;same&lt;/em&gt; state snapshot that the original attempt received. There's no risk of partial state corruption from a failed attempt leaking into a retry. The node either succeeds and its state updates are committed, or it exhausts retries and the original state remains unchanged.&lt;/p&gt;

&lt;p&gt;Observability comes built-in. Each retry emits a &lt;code&gt;RetryAttempt&lt;/code&gt; event visible in LangSmith traces, containing the attempt number, delay duration, exception type, and exception message. This means you can track retry rates per node, identify which external services cause the most retries, and tune your &lt;code&gt;max_attempts&lt;/code&gt; settings based on real data rather than guesswork.&lt;/p&gt;

&lt;p&gt;One implementation detail matters for teams using NVIDIA's &lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;parallel execution enhancements&lt;/a&gt;: when combining &lt;code&gt;@retry&lt;/code&gt; with &lt;code&gt;@independent&lt;/code&gt; (the decorator for parallelizable nodes), &lt;code&gt;@retry&lt;/code&gt; must be the innermost decorator. This ensures the retry logic wraps the actual node execution rather than the parallelization wrapper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timeout Policies: Bounding Unbounded Operations
&lt;/h2&gt;

&lt;p&gt;While retries handle failures that announce themselves with exceptions, timeouts protect against operations that simply never return. The &lt;code&gt;TimeoutPolicy&lt;/code&gt; class provides granular control at three levels: individual nodes, subgraphs, and entire graph invocations.&lt;/p&gt;

&lt;p&gt;The configuration hierarchy reflects how agents actually fail. &lt;code&gt;node_timeout&lt;/code&gt; sets the maximum duration for any single node execution—useful when you know that a particular API call should never take more than 30 seconds. &lt;code&gt;tool_timeout&lt;/code&gt; applies uniformly to all tool calls within a node, separate from the node's own computation time. &lt;code&gt;graph_timeout&lt;/code&gt; sets a wall-clock limit for the entire invocation, preventing runaway agents that loop indefinitely or get stuck in recursive planning cycles.&lt;/p&gt;

&lt;p&gt;The configuration pattern attaches to graph compilation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.timeout&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TimeoutPolicy&lt;/span&gt;

&lt;span class="n"&gt;policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TimeoutPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;node_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# 30 seconds per node
&lt;/span&gt;    &lt;span class="n"&gt;tool_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# 15 seconds per tool call
&lt;/span&gt;    &lt;span class="n"&gt;graph_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;     &lt;span class="c1"&gt;# 5 minutes total
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;compiled_graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timeout_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Timeout behavior is configurable via the &lt;code&gt;on_timeout&lt;/code&gt; parameter. The default &lt;code&gt;"raise"&lt;/code&gt; behavior throws a &lt;code&gt;TimeoutError&lt;/code&gt; that can be caught by an &lt;code&gt;ErrorHandler&lt;/code&gt; (discussed next) or handled in downstream nodes. &lt;code&gt;"interrupt"&lt;/code&gt; triggers LangGraph's human-in-the-loop interrupt mechanism, pausing execution for manual review and decision-making. &lt;code&gt;"fallback"&lt;/code&gt; routes to a specified fallback node, enabling graceful degradation without human intervention.&lt;/p&gt;

&lt;p&gt;The implementation uses &lt;code&gt;asyncio.timeout()&lt;/code&gt; internally for async nodes. Synchronous nodes are wrapped automatically with equivalent behavior, but the async implementation is more efficient—another reason to prefer async node functions in production.&lt;/p&gt;

&lt;p&gt;For teams using LangGraph's multi-agent capabilities, timeout policies integrate with the &lt;a href="https://www.langchain.com/blog/the-agent-development-lifecycle" rel="noopener noreferrer"&gt;agent development stack&lt;/a&gt; at the orchestration level. Sub-agent timeouts can be configured independently, preventing a misbehaving sub-agent from consuming the entire parent agent's timeout budget.&lt;/p&gt;

&lt;p&gt;LangSmith surfaces timeout metrics alongside other observability data: &lt;code&gt;timeout_rate&lt;/code&gt; per node shows what percentage of invocations hit the timeout, while &lt;code&gt;p99_duration&lt;/code&gt; displays your latency distribution with timeout thresholds overlaid. This makes it straightforward to tune timeouts based on actual production behavior rather than guesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error Handler Nodes: Centralized Recovery Logic
&lt;/h2&gt;

&lt;p&gt;Retries and timeouts handle specific failure types, but production agents need a unified place to make recovery decisions. &lt;code&gt;ErrorHandler&lt;/code&gt; nodes provide this centralization, replacing scattered try-except blocks with a coherent error recovery architecture.&lt;/p&gt;

&lt;p&gt;Registration uses scope-based configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_error_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;handler_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# or "subgraph" or ["node_a", "node_b"]
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Global handlers catch any unhandled exception from any node. Subgraph handlers scope to a specific subgraph, useful when different parts of your agent require different recovery strategies. Node-list scoping targets specific nodes, ideal for handling errors from a cluster of related API calls.&lt;/p&gt;

&lt;p&gt;The handler node receives an &lt;code&gt;ErrorContext&lt;/code&gt; object containing everything needed for intelligent recovery decisions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ErrorContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;          &lt;span class="c1"&gt;# The caught exception
&lt;/span&gt;    &lt;span class="n"&gt;failed_node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;              &lt;span class="c1"&gt;# Name of node that raised
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;             &lt;span class="c1"&gt;# Current state snapshot
&lt;/span&gt;    &lt;span class="n"&gt;attempt_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;         &lt;span class="c1"&gt;# Retry attempts if @retry was used
&lt;/span&gt;    &lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;                 &lt;span class="c1"&gt;# Correlation ID for LangSmith
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;attempt_history&lt;/code&gt; field is particularly valuable—it tells you not just that a node failed, but &lt;em&gt;how many times&lt;/em&gt; it failed and &lt;em&gt;what exceptions&lt;/em&gt; occurred on each attempt. A node that fails once with a timeout is different from a node that exhausted five retries with rate limit errors.&lt;/p&gt;

&lt;p&gt;Handler return values control execution flow via the &lt;code&gt;Command&lt;/code&gt; pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;error_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Route to degraded-mode node
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;degraded_synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Interrupt for human review
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interrupt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Timeout on critical operation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Abort with diagnostic payload
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Command(resume=True)&lt;/code&gt; option is particularly powerful—it retries the failed node with a reset retry counter. This enables "escalate and retry" patterns where the handler might first try rate limit backoff, then switch API keys, then finally give up.&lt;/p&gt;

&lt;p&gt;State modification before routing is supported via &lt;code&gt;Command(update={...})&lt;/code&gt;. This enables patterns like marking a data source as unavailable in state before routing to a synthesis node that should work with partial data.&lt;/p&gt;

&lt;p&gt;Two patterns emerge as particularly useful in production. The "circuit breaker" pattern tracks failure rates over time (using state or external storage) and switches to degraded mode after a threshold—useful for agents that should continue operating even when primary data sources are unavailable. The "escalation" pattern creates human-in-the-loop interrupts for specific error types while handling routine failures automatically, respecting the principle that &lt;a href="https://www.ibm.com/think/insights/agentic-ai" rel="noopener noreferrer"&gt;agentic systems&lt;/a&gt; should augment human decision-making rather than eliminate it entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's build a research agent that demonstrates all three fault tolerance primitives. The agent queries three external APIs (arXiv, Wikipedia, and a news service), synthesizes results, and generates a report. This is a common pattern in production agents, and it exposes exactly the failure modes fault tolerance addresses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.retry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;retry&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.timeout&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TimeoutPolicy&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.errors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ErrorContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceable&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="c1"&gt;# State definition captures both data and operational metadata
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;arxiv_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;wikipedia_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;news_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;unavailable_sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Track which sources failed
&lt;/span&gt;    &lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;final_report&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Custom exceptions for clear retry targeting
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SourceUnavailableError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# Node 1: arXiv API with retry for rate limits and transient errors
&lt;/span&gt;&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exponential&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;base_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retryable_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutException&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HTTPStatusError&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_arxiv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query arXiv API for academic papers matching the research query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://export.arxiv.org/api/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Handle rate limits explicitly to trigger retry
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arXiv rate limit hit: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Retry-After&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Parse response (simplified for clarity)
&lt;/span&gt;        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_arxiv_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arxiv_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Node 2: Wikipedia API with similar retry pattern
&lt;/span&gt;&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exponential&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retryable_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutException&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_wikipedia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_wikipedia&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query Wikipedia API for relevant encyclopedia entries.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://en.wikipedia.org/w/api.php&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;srsearch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wikipedia rate limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wikipedia_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Node 3: News API (third-party, less reliable)
&lt;/span&gt;&lt;span class="nd"&gt;@retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_attempts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Fewer retries for less critical source
&lt;/span&gt;    &lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retryable_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutException&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_news&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Query news API for recent coverage. Optional source—failure is acceptable.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;8.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://newsapi.example.com/search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
            &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer NEWS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;News API rate limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;articles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;news_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Synthesis node - no retry needed, operates on local data
&lt;/span&gt;&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Combine results from available sources into unified synthesis.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;available_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arxiv_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;available_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Academic sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arxiv_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; papers found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wikipedia_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;available_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Encyclopedia: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wikipedia_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; entries found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;news_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;available_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;News: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;news_results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; articles found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Note which sources were unavailable for transparency
&lt;/span&gt;    &lt;span class="n"&gt;unavailable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="n"&gt;synthesis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research synthesis for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;synthesis&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Available sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;available_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;None&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unavailable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;synthesis&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unavailable sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unavailable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# In production, this would call an LLM to generate actual synthesis
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Error handler with scoped recovery logic
&lt;/span&gt;&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_error_handler&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_error_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ErrorContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Central error handling for research API nodes.
    Strategy:
    - Rate limits after retry exhaustion: mark source unavailable, continue
    - Timeouts: mark source unavailable, continue (research can proceed with partial data)
    - Unexpected errors: abort with diagnostic info for debugging
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;failed_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;failed_node&lt;/span&gt;
    &lt;span class="n"&gt;exception&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize unavailable_sources if not present
&lt;/span&gt;    &lt;span class="n"&gt;unavailable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RateLimitError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TimeoutException&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="c1"&gt;# Transient failure after retries exhausted - degrade gracefully
&lt;/span&gt;        &lt;span class="n"&gt;source_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;failed_node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;unavailable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Log for observability (LangSmith will capture this)
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Source &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; unavailable after &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attempt_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; attempts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Update state and continue to synthesis
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;unavailable&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Graph-level or node-level timeout - more serious
&lt;/span&gt;        &lt;span class="c1"&gt;# For research agents, we still try to synthesize what we have
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;unavailable&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;failed_node&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Unexpected error - abort with full diagnostic payload
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error_message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed_node&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;failed_node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;state_snapshot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build the graph with fault tolerance
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_research_agent&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add nodes
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_arxiv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_wikipedia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_wikipedia&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_news&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Parallel API queries, then synthesis
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_wikipedia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_wikipedia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize_results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Register error handler scoped to API query nodes only
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_error_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;research_error_handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_arxiv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_wikipedia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Configure timeout policy
&lt;/span&gt;    &lt;span class="n"&gt;timeout_policy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TimeoutPolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;node_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# 60 seconds per node (includes retries)
&lt;/span&gt;        &lt;span class="n"&gt;graph_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;   &lt;span class="c1"&gt;# 5 minutes total
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Compile with checkpointing and timeout policy
&lt;/span&gt;    &lt;span class="n"&gt;compiled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;timeout_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;timeout_policy&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;compiled&lt;/span&gt;

&lt;span class="c1"&gt;# Usage example
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_research_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transformer architecture neural networks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Note: Some sources were unavailable: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unavailable_sources&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run this agent and one API fails, you'll see the fault tolerance in action. The &lt;code&gt;@retry&lt;/code&gt; decorator handles transient failures with exponential backoff. If retries are exhausted, the error handler catches the exception, marks the source as unavailable in state, and routes to synthesis. The agent completes with partial data rather than crashing.&lt;/p&gt;

&lt;p&gt;In LangSmith traces, you'll see &lt;code&gt;RetryAttempt&lt;/code&gt; events for each retry, the error handler invocation, and the modified routing decision—complete visibility into exactly how the agent recovered.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Immediate adoption path&lt;/strong&gt;: Start by adding &lt;code&gt;@retry&lt;/code&gt; to any node that makes external calls. This is the lowest-friction change with the highest impact. Most teams see immediate reduction in failed runs simply by handling transient rate limits and timeouts gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrating from custom retry logic&lt;/strong&gt;: If you've built manual try/except/sleep patterns around external calls, the &lt;code&gt;@retry&lt;/code&gt; decorator replaces 20-50 lines of boilerplate per node. Beyond code reduction, the decorator handles backoff calculation, metric emission, and LangSmith integration automatically. Your custom logic probably doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeout strategy&lt;/strong&gt;: Begin with generous timeouts—2-3x your observed p99 latency for each node type. Overly aggressive timeouts cause false failures; you can tighten them based on LangSmith metrics once you have production data. The &lt;code&gt;p99_duration&lt;/code&gt; metric with timeout threshold overlay makes this tuning straightforward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ErrorHandler placement&lt;/strong&gt;: Start with a single global handler that logs errors and emits alerts. This gives you immediate observability into all failures. Add scoped handlers as specific recovery patterns emerge from production data—don't try to anticipate every failure mode upfront.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent considerations&lt;/strong&gt;: For teams using &lt;a href="https://www.langchain.com/blog/langgraph-multi-agent-workflows" rel="noopener noreferrer"&gt;LangGraph's multi-agent workflows&lt;/a&gt;, fault tolerance automatically benefits sub-agents. Configure policies at the orchestration level, and sub-agents inherit appropriate timeouts. This prevents the common failure mode of a misbehaving sub-agent consuming resources indefinitely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost awareness&lt;/strong&gt;: Retries multiply LLM API costs. A node with &lt;code&gt;max_attempts=5&lt;/code&gt; calling Claude 3.5 Sonnet can cost 5x what you budgeted per invocation. Set &lt;code&gt;max_attempts&lt;/code&gt; conservatively for expensive model calls—often 2 is sufficient for LLM calls, while API calls to external services can tolerate higher retry counts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing fault tolerance&lt;/strong&gt;: LangSmith Sandboxes support fault injection, enabling chaos testing without mocking your entire infrastructure. Inject rate limits, timeouts, and specific exceptions into production-like runs to validate that your error handlers behave correctly before real failures occur.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability checklist&lt;/strong&gt;: Enable &lt;code&gt;retry_rate&lt;/code&gt;, &lt;code&gt;timeout_rate&lt;/code&gt;, and &lt;code&gt;error_handler_invocations&lt;/code&gt; metrics in your LangSmith dashboard. These three metrics tell you whether fault tolerance is working as intended or masking underlying issues that need architectural fixes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anti-pattern to avoid&lt;/strong&gt;: Don't wrap entire graphs in a single retry at the invocation level. This loses the granularity that makes fault tolerance valuable. A graph-level retry doesn't know which node failed, can't route to fallbacks, and may re-execute expensive operations unnecessarily. Use node-level retries with error handlers for precise control.&lt;/p&gt;

&lt;p&gt;The broader shift here is from reactive debugging to proactive resilience. The &lt;a href="https://www.langchain.com/blog/the-agent-development-lifecycle" rel="noopener noreferrer"&gt;agent development lifecycle&lt;/a&gt; no longer ends at deployment—it extends into production operations, and fault tolerance is the bridge between "my agent works" and "my agent works reliably at scale."&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Fault-Tolerant Data Pipeline Agent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build an agent that extracts data from three different sources (a public API, a web scraper, and a local database), transforms the combined data, and loads it into a target system. This is a practical ETL pattern where fault tolerance directly impacts whether the pipeline runs unattended.&lt;/p&gt;

&lt;p&gt;Implementation requirements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Each extraction node gets &lt;code&gt;@retry&lt;/code&gt; with source-appropriate settings (aggressive retries for your own database, conservative for rate-limited public APIs)&lt;/li&gt;
&lt;li&gt;Configure &lt;code&gt;TimeoutPolicy&lt;/code&gt; with different tolerances for each phase—extraction can be slow, transformation should be fast&lt;/li&gt;
&lt;li&gt;Build an error handler that implements "best effort" semantics: continue with available data if any source fails, but abort if all sources fail&lt;/li&gt;
&lt;li&gt;Add a "validation" node after transformation that checks data quality and routes to an error handler if thresholds aren't met&lt;/li&gt;
&lt;li&gt;Include LangSmith tracing with custom metadata tags for data quality metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stretch goal: Add a "circuit breaker" pattern where repeated failures from one source cause the agent to skip that source entirely for subsequent runs (persisted via checkpointing), with automatic re-enablement after a cooldown period.&lt;/p&gt;

&lt;p&gt;This project exercises all three fault tolerance primitives in a realistic scenario while producing something genuinely useful for data engineering workflows. The patterns transfer directly to any agent that coordinates multiple unreliable external systems—which is to say, most production agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;langchain-ai/langgraph: Build resilient agents. - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/langgraph-multi-agent-workflows" rel="noopener noreferrer"&gt;LangGraph: Multi-Agent Workflows - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/the-agent-development-lifecycle" rel="noopener noreferrer"&gt;The Agent Development Lifecycle: Build, Test, Deploy &amp;amp; Monitor AI ... - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;LangChain Announces Enterprise Agentic AI Platform Built with ... - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.ibm.com/think/insights/agentic-ai" rel="noopener noreferrer"&gt;Agentic AI: 4 reasons why it's the next big thing in AI research - IBM&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly: Bezos Bets $12B on Physical AI, Anthropic's Security Crisis, and the New Tech Power Structure</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 15 Jun 2026 12:02:31 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-bezos-bets-12b-on-physical-ai-anthropics-security-crisis-and-the-new-tech-power-15ep</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-bezos-bets-12b-on-physical-ai-anthropics-security-crisis-and-the-new-tech-power-15ep</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly: Bezos Bets $12B on Physical AI, Anthropic's Security Crisis, and the New Tech Power Structure
&lt;/h1&gt;

&lt;p&gt;The frontier AI landscape shifted dramatically this week as Jeff Bezos emerged from relative AI sidelines with a massive bet on physical-world intelligence, while Anthropic faced an unprecedented government-ordered model takedown that raises fundamental questions about regulatory oversight of deployed systems. Meanwhile, the old guard struggles—Meta's AI unit reportedly descends into dysfunction as Google fires the first shots in what could become a brutal consumer pricing war. The message is clear: the AI industry's second act looks nothing like its first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Jeff Bezos's Prometheus Raises $12B to Build 'Artificial General Engineer'
&lt;/h2&gt;

&lt;p&gt;Jeff Bezos is making his biggest AI play yet. Prometheus, the stealth company backed by the Amazon founder, has closed a &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;$12 billion funding round&lt;/a&gt; aimed at developing what the company calls an "Artificial General Engineer"—AI systems purpose-built for physical-world engineering tasks rather than the text and image generation that dominates current frontier development.&lt;/p&gt;

&lt;p&gt;The funding represents one of the largest single rounds in AI history and signals Bezos's conviction that the next major breakthrough lies in bridging digital AI capabilities with real-world physical applications. Prometheus is reportedly recruiting heavily from robotics labs, mechanical engineering departments, and aerospace companies, suggesting a scope that extends well beyond Amazon's warehouse robotics expertise.&lt;/p&gt;

&lt;p&gt;Industry observers note that while OpenAI, Anthropic, and Google have focused primarily on language models and digital agents, physical-world AI—systems that can reason about material constraints, design manufacturable components, and interact with the built environment—remains comparatively underdeveloped. Prometheus appears positioned to exploit this gap.&lt;/p&gt;

&lt;p&gt;The Bezos backing adds credibility that few other investors could provide, given his track record with Blue Origin and Amazon's logistics automation. Whether "Artificial General Engineer" represents genuine technical ambition or marketing positioning remains to be seen, but the resources to pursue it are now in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Takes Claude Fable 5 Offline After Government Security Order
&lt;/h2&gt;

&lt;p&gt;In an unprecedented move, Anthropic has &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;suspended public access&lt;/a&gt; to its Claude Fable 5 model following a directive from the U.S. government identifying a potential jailbreak vulnerability. The takedown marks the first time a frontier AI company has pulled a deployed model at government request over security concerns.&lt;/p&gt;

&lt;p&gt;Fable 5, launched earlier this year as a consumer-accessible version of Anthropic's Mythos cybersecurity model, was designed to make advanced reasoning capabilities available to everyday users while maintaining the safety guardrails the company is known for. However, security researchers had previously raised concerns that the model's guardrails could be circumvented through specific prompt sequences, potentially exposing capabilities intended only for the enterprise Mythos deployment.&lt;/p&gt;

&lt;p&gt;The government's intervention—reportedly originating from a classified assessment—raises significant questions about the emerging oversight framework for frontier models. Anthropic has not disclosed the specific vulnerability or timeline for potential restoration of service, stating only that it is "working cooperatively with relevant authorities."&lt;/p&gt;

&lt;p&gt;The incident arrives at a particularly sensitive moment as Congress debates federal AI legislation. Critics argue the takedown demonstrates responsible industry-government coordination; others worry it sets precedent for arbitrary government control over deployed AI systems without public transparency about the underlying security assessment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meta's Internal AI Unit Reportedly in Chaos
&lt;/h2&gt;

&lt;p&gt;The reorganization &lt;a href="https://www.wired.com" rel="noopener noreferrer"&gt;Mark Zuckerberg promised&lt;/a&gt; would streamline Meta's AI efforts has apparently achieved the opposite. Engineers speaking anonymously describe the company's months-old centralized AI unit as a dysfunctional work environment marked by unclear leadership, conflicting priorities, and an exodus of senior talent.&lt;/p&gt;

&lt;p&gt;The chaos reportedly stems from Meta's abrupt strategic pivot toward proprietary models following the Muse Spark launch, abandoning the open-source approach that had defined its Llama model family. Teams that had spent years building for open release found their work redirected or deprecated, while newly hired executives from closed-model backgrounds clashed with existing research culture.&lt;/p&gt;

&lt;p&gt;Separately, &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;reports indicate&lt;/a&gt; Meta may unwind its $2 billion acquisition of robotics firm Manus after pressure from Beijing, where Manus maintains significant manufacturing partnerships. The combination of strategic whiplash and geopolitical complications has left the unit struggling to execute on any coherent vision.&lt;/p&gt;

&lt;p&gt;The situation contrasts sharply with the narrative Zuckerberg presented to investors just months ago, positioning Meta as a serious contender to OpenAI and Google in frontier AI development. Whether the company can stabilize before losing irreplaceable talent to competitors with clearer direction remains an open question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Fires Opening Salvo in AI Subscription Price Wars
&lt;/h2&gt;

&lt;p&gt;Google announced &lt;a href="https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again" rel="noopener noreferrer"&gt;aggressive new pricing&lt;/a&gt; for its AI subscription tiers this week, slashing rates in what appears to be a deliberate move to pressure OpenAI and Anthropic on consumer pricing. The timing—following the company's recent Gemini 3.1 Pro release with strong benchmark performance—suggests Google is ready to leverage its infrastructure advantages to compete on cost.&lt;/p&gt;

&lt;p&gt;The new pricing structure effectively halves the monthly cost for access to Gemini's most capable models, while introducing a limited free tier that exceeds what competitors currently offer paid subscribers. Google's cloud infrastructure scale makes such pricing sustainable in ways that smaller rivals may struggle to match.&lt;/p&gt;

&lt;p&gt;For OpenAI and Anthropic, the move forces an uncomfortable choice: match Google's pricing and accept margin compression, or maintain current rates and risk losing price-sensitive customers. Neither company has announced responses, though industry analysts expect some reaction within weeks.&lt;/p&gt;

&lt;p&gt;The broader implication is accelerating commoditization of consumer AI access. As base model capabilities converge and pricing drops, differentiation will increasingly depend on specialized features, integration depth, and enterprise offerings—shifting the competitive battleground away from raw model performance toward ecosystem advantages where Google already holds significant cards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The academic and practitioner communities continue building the conceptual and technical foundations for production agentic systems. A notable new paper, &lt;a href="https://arxiv.org/pdf/2511.17332" rel="noopener noreferrer"&gt;"Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing,"&lt;/a&gt; demonstrates how frameworks including CrewAI, LangGraph, AutoGen, and MetaGPT can be deployed in industrial cyber-physical systems—a significant step toward agentic AI in high-stakes environments.&lt;/p&gt;

&lt;p&gt;The research emphasizes &lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;plan-act-reflect loops&lt;/a&gt; as the core pattern enabling dynamic strategy adaptation, allowing agents to modify their approaches based on real-time feedback from manufacturing environments. This echoes patterns identified in &lt;a href="https://resources.anthropic.com/building-effective-ai-agents" rel="noopener noreferrer"&gt;Anthropic's guidance&lt;/a&gt; on building effective agents, which emphasizes augmented LLMs and workflow orchestration over fully autonomous systems.&lt;/p&gt;

&lt;p&gt;Human-in-the-loop interfaces are emerging as a critical pattern for production deployments, enabling domain experts to oversee agentic operations without requiring machine learning expertise. A &lt;a href="https://arxiv.org/html/2601.12560v1" rel="noopener noreferrer"&gt;comprehensive taxonomy paper&lt;/a&gt; published recently attempts to unify multi-agent coordination patterns—chain, star, mesh, and workflow graphs—across frameworks, providing practitioners with a common vocabulary for architectural decisions.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/VoltAgent/awesome-ai-agent-papers" rel="noopener noreferrer"&gt;awesome-ai-agent-papers&lt;/a&gt; repository on GitHub continues tracking academic work on emerging paradigms, including skill libraries that may eventually replace multi-agent systems for many use cases and information-flow orchestration approaches that simplify reasoning about agent behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  US House Releases Bipartisan Draft Bill to Preempt State AI Regulations
&lt;/h2&gt;

&lt;p&gt;A bipartisan group of House lawmakers &lt;a href="https://www.reuters.com/business/us-house-lawmakers-release-draft-bill-regulate-ai-2026-06-04" rel="noopener noreferrer"&gt;released draft legislation&lt;/a&gt; this week that would prohibit states from regulating AI development, aiming to create a unified federal framework for AI governance. The move represents the most significant push yet toward centralized AI policy in the United States.&lt;/p&gt;

&lt;p&gt;The draft bill would preempt existing state-level AI regulations already in effect in California, Colorado, and several other states, replacing the current patchwork with federal standards. Sponsors argue that fragmented state rules create compliance burdens that disadvantage American companies against international competitors operating under single regulatory regimes.&lt;/p&gt;

&lt;p&gt;Industry reaction has been predictably split. Large AI developers generally support federal preemption, citing operational simplicity; civil society groups and some state attorneys general have criticized the bill as removing local accountability for AI harms. The draft leaves enforcement mechanisms vague, a gap that will likely draw scrutiny in committee markup.&lt;/p&gt;

&lt;p&gt;Whether the legislation advances in an election year remains uncertain, but its bipartisan sponsorship suggests AI governance is achieving rare cross-party consensus—at least on the principle of federal primacy over states.&lt;/p&gt;

&lt;h2&gt;
  
  
  KPMG Pulls AI Report After Discovering Hallucinated Content
&lt;/h2&gt;

&lt;p&gt;In an embarrassing reversal, &lt;a href="https://www.wired.com" rel="noopener noreferrer"&gt;KPMG withdrew&lt;/a&gt; a published report on enterprise AI adoption after discovering it contained apparent AI-generated hallucinations, including fabricated statistics and nonexistent research citations. The incident highlights ongoing quality control challenges as professional services firms integrate AI tools into content production.&lt;/p&gt;

&lt;p&gt;The firm has not disclosed which AI system was used or how the hallucinated content passed review, but the episode underscores a persistent gap between AI-assisted drafting capabilities and the verification processes needed to catch errors before publication. For a consulting firm whose value proposition rests on authoritative analysis, the mistake carries reputational implications beyond the immediate retraction.&lt;/p&gt;

&lt;p&gt;Industry observers note this is unlikely to be an isolated incident. As AI writing assistance becomes ubiquitous across professional services, the risk of sophisticated-sounding but fabricated content reaching clients and public audiences grows proportionally. The KPMG case may accelerate development of verification tooling and audit trails for AI-assisted professional content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tech Industry Power Structure Shifts: FAANG Becomes MANGOS
&lt;/h2&gt;

&lt;p&gt;Industry observers are &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;noting a symbolic shift&lt;/a&gt; in tech's informal power structure as the venerable FAANG acronym gives way to new formulations reflecting the AI era's changed landscape. The emergence of "MANGOS"—Microsoft, Apple, Nvidia, Google, OpenAI, and SpaceX—captures how AI infrastructure and applications have reshuffled the hierarchy.&lt;/p&gt;

&lt;p&gt;The SpaceX IPO, expected later this year, would cement the company's position among the most valuable technology firms globally, while OpenAI's commercial momentum has made it impossible to discuss frontier tech without including it. Meanwhile, Netflix and Meta—both original FAANG members—have seen their influence on the industry's direction diminish relative to companies driving AI infrastructure.&lt;/p&gt;

&lt;p&gt;In a noteworthy detail, &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;Amazon CEO&lt;/a&gt; Andy Jassy reportedly raised concerns about Anthropic model vulnerabilities with government contacts prior to the Fable 5 takedown—a reminder that despite Amazon's significant investment in Anthropic, the relationship between major cloud providers and their AI portfolio companies remains complex.&lt;/p&gt;

&lt;p&gt;The acronym shift may seem trivial, but it reflects genuine reordering of which companies set the industry's agenda. Infrastructure providers and AI-native companies have displaced consumer internet platforms as the center of gravity.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What to Watch:&lt;/strong&gt; The coming weeks will test whether Anthropic can resolve the Fable 5 situation without lasting damage to user trust and whether Google's pricing moves trigger a broader race to the bottom. The federal preemption bill's committee progress bears monitoring—if it advances quickly, the current fragmented AI regulatory landscape could look very different by year's end.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;Artificial Intelligence - AI News - Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;TechCrunch | Startup and Technology News&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wired.com" rel="noopener noreferrer"&gt;WIRED - The Latest in Technology, Science, Culture and Business | WIRED&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/us-house-lawmakers-release-draft-bill-regulate-ai-2026-06-04" rel="noopener noreferrer"&gt;US House lawmakers release draft bill to prohibit state AI rules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2511.17332" rel="noopener noreferrer"&gt;[PDF] Hybrid Agentic AI and Multi-Agent Systems in Smart Manufacturing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;Agentifying Agentic AI - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.12560v1" rel="noopener noreferrer"&gt;Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/VoltAgent/awesome-ai-agent-papers" rel="noopener noreferrer"&gt;VoltAgent/awesome-ai-agent-papers - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/building-effective-ai-agents" rel="noopener noreferrer"&gt;Building Effective AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://techcrunch.com/2026/02/19/googles-new-gemini-pro-model-has-record-benchmark-scores-again" rel="noopener noreferrer"&gt;Google's new Gemini Pro model has record benchmark scores — again | TechCrunch&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>LangSmith Engine: Self-Improving Agents That Debug Other Agents</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:06:38 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/langsmith-engine-self-improving-agents-that-debug-other-agents-305d</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/langsmith-engine-self-improving-agents-that-debug-other-agents-305d</guid>
      <description>&lt;h1&gt;
  
  
  LangSmith Engine: Self-Improving Agents That Debug Other Agents
&lt;/h1&gt;

&lt;p&gt;The moment your agent portfolio grows beyond a handful of deployments, you hit an uncomfortable truth: you're now spending more time debugging agents than building them. At &lt;a href="https://interrupt.langchain.com" rel="noopener noreferrer"&gt;Interrupt 2026&lt;/a&gt;, LangChain unveiled something that directly addresses this scaling problem—LangSmith Engine, an autonomous agent whose sole purpose is analyzing, diagnosing, and suggesting fixes for your production agent failures. This isn't another dashboard with fancier visualizations. It's the formalization of a meta-agent paradigm where the work of improving agents becomes itself an agentic task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: The Meta-Agent Paradigm Shift
&lt;/h2&gt;

&lt;p&gt;The announcement landed during Harrison Chase's keynote at &lt;a href="https://interrupt.langchain.com" rel="noopener noreferrer"&gt;Interrupt 2026&lt;/a&gt;, held May 13-14 in San Francisco. Engine represents a categorical shift from passive observability—where humans sift through traces trying to understand what went wrong—to active diagnosis where an agent formulates hypotheses, tests them against historical data, and generates concrete remediation suggestions.&lt;/p&gt;

&lt;p&gt;Why does this matter right now? The &lt;a href="https://pub.towardsai.net/a-developers-guide-to-agentic-frameworks-in-2026-3f22a492dc3d" rel="noopener noreferrer"&gt;2026 agentic AI landscape&lt;/a&gt; has matured to the point where organizations are running not one or two experimental agents, but entire portfolios of production systems. When you're operating dozens of agents across customer support, data pipelines, and internal tooling, the manual trace inspection that worked for a single prototype becomes untenable. Teams report spending 60-70% of their agent engineering time on post-deployment debugging rather than capability development.&lt;/p&gt;

&lt;p&gt;The architectural insight driving Engine is subtle but profound: agent improvement itself has the characteristics of an agentic task. It requires reasoning over incomplete information, tool use to query trace databases, hypothesis generation and testing, and memory of past investigations to avoid re-diagnosing known issues. By treating debugging as a first-class agent workflow rather than a human dashboard activity, LangChain is betting that AI can accelerate the agent improvement loop just as dramatically as agents accelerated other knowledge work.&lt;/p&gt;

&lt;p&gt;Engine draws a sharp distinction from traditional APM tools. Where Datadog or New Relic might tell you that your agent's P95 latency spiked, Engine investigates &lt;em&gt;why&lt;/em&gt;—was it a slow tool call, an LLM inference delay, or an orchestration bottleneck from suboptimal state checkpointing? And crucially, it proposes what to do about it with specific code changes, prompt rewrites, or architectural modifications.&lt;/p&gt;

&lt;p&gt;The target audience is clear: teams operating five or more agents in production who need automated quality feedback loops. If you're still iterating on a single agent, the overhead of deploying Engine probably isn't worth it. But once you cross that threshold where agent failures are a daily occurrence rather than an exceptional event, Engine's value proposition becomes compelling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: How an Agent Debugs Agents
&lt;/h2&gt;

&lt;p&gt;Engine's architecture rests on SmithDB, a new data layer for agent observability that LangChain announced in the same week. SmithDB provides structured trace storage optimized specifically for agent queries—not generic time-series data, but relational structures that capture parent-child relationships between agent calls, tool invocations, and LLM inference requests. This foundation enables the kind of complex trace traversal that Engine's investigations require.&lt;/p&gt;

&lt;p&gt;The overall system follows a three-layer architecture: trace ingestion, pattern detection, and remediation generation. Trace ingestion handles the firehose of observability data from your &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;LangGraph deployments&lt;/a&gt;, normalizing the heterogeneous data from different agent types into a consistent schema. Pattern detection runs continuously, applying both rule-based heuristics and learned classifiers to identify anomalies worth investigating. Remediation generation is where Engine's agentic nature emerges—it spins up investigation workflows that can last minutes or hours depending on the complexity of the issue.&lt;/p&gt;

&lt;p&gt;Engine's reasoning loop follows a &lt;a href="https://github.com/nirdiamant/genai_agents" rel="noopener noreferrer"&gt;ReAct-style&lt;/a&gt; cycle: observe anomaly, formulate hypothesis, execute investigative action, evaluate results, repeat. For example, when detecting elevated failure rates in a customer support agent, Engine might hypothesize that a recent prompt change caused the regression. It then queries SmithDB for traces before and after the change, diffs the prompt versions, examines failure modes in both cohorts, and either confirms or rejects the hypothesis before moving to alternatives.&lt;/p&gt;

&lt;p&gt;Memory integration is essential for avoiding duplicate work. Engine maintains episodic memory of past investigations, indexed by failure signature and root cause. When a similar pattern emerges, Engine retrieves relevant past investigations, potentially short-circuiting the diagnosis with a "we've seen this before" assessment. This connects to the &lt;a href="https://github.com/TsinghuaC3I/Awesome-Memory-for-Agents" rel="noopener noreferrer"&gt;broader memory architecture&lt;/a&gt; patterns emerging in agentic systems—treating investigative context as a persistent asset rather than a single-session artifact.&lt;/p&gt;

&lt;p&gt;Engine's tool repertoire includes trace querying (SQL-like interfaces to SmithDB), diff generation (comparing prompt versions, tool configurations, and agent code), prompt variation testing (spinning up isolated evaluation runs with modified prompts), and cost impact estimation (projecting how suggested changes would affect token budgets based on historical patterns).&lt;/p&gt;

&lt;p&gt;A subtle but important design decision: Engine avoids infinite recursion by operating in a separate instrumentation namespace. Engine's own traces are never visible to itself—it cannot enter a pathological loop of debugging its own debugging attempts. This namespace isolation is enforced at the SDK level, ensuring Engine's investigation activities remain invisible to its own pattern detection systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trace Analysis Patterns Engine Detects
&lt;/h2&gt;

&lt;p&gt;Engine ships with a library of detection patterns refined against LangChain's internal agent fleet, and teams can extend this library with custom detectors. The most impactful built-in patterns address the failure modes that consume the majority of debugging time.&lt;/p&gt;

&lt;p&gt;Tool call failure cascades represent one of the trickiest patterns to diagnose manually. When an agent makes a tool call that fails, the downstream behavior depends heavily on how the failure is handled—does the agent retry? Fall back to an alternative? Propagate the error? Engine distinguishes between recoverable retry patterns (where a transient failure resolves on retry) and true cascade failures (where one failed tool call corrupts state that triggers subsequent failures). This distinction matters because the remediation differs dramatically: retry patterns might need backoff tuning while cascades require architectural changes to state management.&lt;/p&gt;

&lt;p&gt;Prompt drift detection catches a subtle but common issue. Over time, production prompts diverge from the versions that were evaluated during development—through hotfixes, A/B test winners that weren't properly documented, or well-intentioned tweaks that accumulate. Engine maintains a baseline registry of evaluated prompts and flags when production traces show prompts that have drifted beyond configurable thresholds. This directly addresses the &lt;a href="https://arxiv.org/html/2604.16646v1" rel="noopener noreferrer"&gt;observability challenges&lt;/a&gt; identified in empirical studies of agentic systems.&lt;/p&gt;

&lt;p&gt;Latency attribution decomposes end-to-end response times into their constituent parts: LLM inference time, tool execution duration, and orchestration overhead (the time spent in your agent code between LLM calls). This decomposition reveals whether performance issues stem from model latency, slow external APIs, or inefficient agent logic—each requiring different remediation approaches.&lt;/p&gt;

&lt;p&gt;Cost anomaly detection goes beyond simple budget alerts. When Engine flags a run that exceeded expected token budgets, it provides root cause analysis: was it excessive tool call chatter? A prompt that triggered verbose responses? A retry loop that repeated expensive operations? This contextual information transforms a "you spent too much" alert into actionable guidance on where to optimize.&lt;/p&gt;

&lt;p&gt;State corruption patterns are particularly valuable for teams using &lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;checkpointed agent architectures&lt;/a&gt;. Engine detects when saved state leads to invalid downstream behavior—for example, when a checkpoint captures a partial tool response that causes parsing failures on resume. These bugs are notoriously difficult to reproduce in development because they depend on precise timing and state sequences.&lt;/p&gt;

&lt;p&gt;Internal benchmarks from LangChain's own agent fleet show 47x faster mean-time-to-diagnosis when using Engine compared to manual trace inspection. This metric captures the time from anomaly detection to root cause identification—not including remediation, which still requires human judgment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Remediation Suggestion Pipeline
&lt;/h2&gt;

&lt;p&gt;Diagnosis without actionable suggestions is just sophisticated complaining. Engine's remediation pipeline transforms investigative conclusions into concrete, applicable fixes.&lt;/p&gt;

&lt;p&gt;The key design principle is specificity: Engine generates actual code patches, not abstract descriptions. When Engine determines that a tool retry should include exponential backoff, it doesn't suggest "consider adding backoff logic"—it produces a diff that can be applied to your agent definition. This aligns with &lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;emerging research&lt;/a&gt; on agentic systems that suggests concrete, executable outputs drive higher adoption than abstract recommendations.&lt;/p&gt;

&lt;p&gt;Prompt rewrite suggestions represent Engine's most frequently used remediation type. When Engine identifies prompt-related failures—ambiguous instructions that lead to tool misuse, missing context that causes hallucinations, or overly verbose system prompts that consume unnecessary tokens—it proposes alternative formulations. These suggestions come packaged with A/B test configurations, allowing teams to validate improvements before full deployment.&lt;/p&gt;

&lt;p&gt;Guard rail recommendations address systematic vulnerabilities rather than individual failures. When Engine observes patterns like repeated jailbreak attempts, PII exposure in tool outputs, or runaway token consumption, it suggests where to add protective nodes—ContentFilter for safety violations, RateLimiter for cost control, or validation gates for data integrity. These suggestions reference specific positions in your &lt;a href="https://www.langchain.com/blog/langgraph-multi-agent-workflows" rel="noopener noreferrer"&gt;LangGraph agent topology&lt;/a&gt;, making implementation straightforward.&lt;/p&gt;

&lt;p&gt;Every suggestion includes a confidence score reflecting Engine's uncertainty. High-confidence suggestions (0.8+) indicate patterns Engine has seen many times with consistent remediation outcomes. Low-confidence suggestions (below 0.5) flag novel patterns or ambiguous root causes where human judgment is essential. This calibration helps teams prioritize which suggestions to evaluate first and which require careful human review.&lt;/p&gt;

&lt;p&gt;Integration with LangChain's Fleet deployment system enables staged rollouts. Engine suggestions can be automatically staged as draft deployments pending human approval—the fix exists as a deployable artifact but won't reach production until a human explicitly approves it. This preserves the human-in-the-loop requirement that remains essential for production changes while reducing the friction between diagnosis and deployment.&lt;/p&gt;

&lt;p&gt;The limitations are explicit and by design: Engine cannot modify deployed agents directly. Even high-confidence suggestions with clear positive impact require human approval. This constraint acknowledges both the liability implications of automated production changes and the reality that Engine may have blind spots in understanding business context that would affect remediation decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's walk through setting up Engine on an existing LangGraph agent. We'll start with a customer support agent that's already instrumented with LangSmith tracing, then configure Engine to monitor and investigate its failures.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# engine_setup.py
# Setting up LangSmith Engine for automated agent debugging
# Requires: langsmith&amp;gt;=0.4.0, langgraph&amp;gt;=0.5.0, langsmith-engine&amp;gt;=1.0.0
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith_engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InvestigationConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Scope&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize LangSmith client with Engine capabilities
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LANGSMITH_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# Engine requires the engine_enabled flag for trace access
&lt;/span&gt;    &lt;span class="n"&gt;engine_enabled&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define investigation scope - which agents Engine should monitor
# This prevents Engine from investigating its own traces (separate namespace)
&lt;/span&gt;&lt;span class="n"&gt;investigation_scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;project_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support-prod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support-staging&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# Exclude Engine's own project to prevent recursion
&lt;/span&gt;    &lt;span class="n"&gt;exclude_projects&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langsmith-engine-internal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# Only investigate traces with specific tags
&lt;/span&gt;    &lt;span class="n"&gt;required_tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;# Time window for historical analysis
&lt;/span&gt;    &lt;span class="n"&gt;lookback_hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;168&lt;/span&gt;  &lt;span class="c1"&gt;# One week of trace history
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure investigation behavior
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InvestigationConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Maximum depth of causal chain analysis
&lt;/span&gt;    &lt;span class="n"&gt;max_investigation_depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Token budget cap for Engine's own LLM calls per investigation
&lt;/span&gt;    &lt;span class="n"&gt;max_tokens_per_investigation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Confidence threshold for auto-staging suggestions to Fleet
&lt;/span&gt;    &lt;span class="n"&gt;auto_stage_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Patterns to prioritize (Engine will investigate these first)
&lt;/span&gt;    &lt;span class="n"&gt;priority_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_cascade_failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_drift&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_anomaly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="c1"&gt;# Memory configuration for investigation history
&lt;/span&gt;    &lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodic_retention_days&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;similarity_threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# For matching similar past issues
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_retrieved_investigations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Engine with scope and configuration
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;investigation_scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Model for Engine's reasoning (Claude or GPT-4 class recommended)
&lt;/span&gt;    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Notification webhook for completed investigations
&lt;/span&gt;    &lt;span class="n"&gt;webhook_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SLACK_WEBHOOK_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start continuous monitoring (runs as background process)
# Engine will automatically trigger investigations when anomalies are detected
&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_monitoring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Anomaly detection interval
&lt;/span&gt;    &lt;span class="n"&gt;check_interval_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# Thresholds that trigger automatic investigation
&lt;/span&gt;    &lt;span class="n"&gt;triggers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure_rate_threshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;gt;5% failures triggers investigation
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_p95_multiplier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# 2x normal P95 triggers investigation
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_anomaly_zscore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;       &lt;span class="c1"&gt;# 3 std devs above mean triggers investigation
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Engine monitoring started. Investigations will run automatically.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's look at manually triggering an investigation and processing the results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# investigate_incident.py
# Manually triggering and processing an Engine investigation
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith_engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;InvestigationReport&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="c1"&gt;# Assuming engine is already initialized from previous setup
# Trigger investigation for a specific trace that showed anomalous behavior
&lt;/span&gt;&lt;span class="n"&gt;investigation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;investigate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Can investigate by trace_id, run_id, or time range with filters
&lt;/span&gt;    &lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;abc123-def456-ghi789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="c1"&gt;# Or investigate a pattern across multiple traces
&lt;/span&gt;    &lt;span class="c1"&gt;# pattern_query={
&lt;/span&gt;    &lt;span class="c1"&gt;#     "failure_type": "tool_timeout",
&lt;/span&gt;    &lt;span class="c1"&gt;#     "time_range": (datetime.now() - timedelta(hours=24), datetime.now()),
&lt;/span&gt;    &lt;span class="c1"&gt;#     "min_occurrences": 10
&lt;/span&gt;    &lt;span class="c1"&gt;# },
&lt;/span&gt;
    &lt;span class="c1"&gt;# Investigation focus hints (optional, speeds up diagnosis)
&lt;/span&gt;    &lt;span class="n"&gt;initial_hypotheses&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_regression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Investigation runs asynchronously - can poll or await
&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InvestigationReport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;investigation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;await_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timeout_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Parse the investigation report
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Investigation ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Duration: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration_seconds&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Engine tokens consumed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;token_usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Root cause analysis
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Root Cause Analysis ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Primary cause: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evidence traces: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;root_cause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;supporting_traces&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# View the hypothesis chain (Engine's reasoning process)
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Investigation Chain ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hypothesis_chain&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Hypothesis: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hypothesis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   Action: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action_taken&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   Result: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;   Verdict: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Confirmed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confirmed&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Rejected&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Remediation suggestions
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Suggested Remediations ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;suggestions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Description: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# For code changes, show the diff
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code_diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Diff:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;code_diff&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# For prompt changes, show before/after
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_change&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Original prompt hash: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;original_hash&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Suggested prompt:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new_prompt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Apply suggestion if confidence is high enough
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.85&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_rewrite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Stage the suggestion in Fleet (requires human approval to deploy)
&lt;/span&gt;        &lt;span class="n"&gt;deployment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stage_to_fleet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;fleet_project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-support-prod&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;variant_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engine-suggestion-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;traffic_percentage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;  &lt;span class="c1"&gt;# Start with 10% A/B test
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Staged as Fleet variant: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;deployment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;variant_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, here's how to verify that a suggested fix actually improved agent performance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# verify_improvement.py
# Running evaluation to verify Engine's suggested fix
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith.evaluation&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith_engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Get the suggestion that was staged
&lt;/span&gt;&lt;span class="n"&gt;suggestion_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestion-xyz789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;suggestion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_suggestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suggestion_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run evaluation comparing original vs suggested prompt
&lt;/span&gt;&lt;span class="n"&gt;eval_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Your agent function with the original configuration
&lt;/span&gt;    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="c1"&gt;# Dataset of test cases (can auto-generate from failure traces)
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_eval_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;include_failure_cases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;include_success_cases&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;

    &lt;span class="n"&gt;evaluators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Built-in evaluator
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_accuracy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Custom evaluator for tool use
&lt;/span&gt;        &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_evaluator&lt;/span&gt;  &lt;span class="c1"&gt;# Engine-generated evaluator for this specific issue
&lt;/span&gt;    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="n"&gt;experiment_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pre-fix-baseline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run same evaluation with suggested fix
&lt;/span&gt;&lt;span class="n"&gt;eval_results_fixed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;new_prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_eval_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_samples&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;evaluators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_accuracy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;custom_evaluator&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;experiment_prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post-fix-comparison&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compare results
&lt;/span&gt;&lt;span class="n"&gt;comparison&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compare_experiments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;baseline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eval_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;comparison&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eval_results_fixed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;experiment_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Improvement in correctness: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comparison&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deltas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;correctness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Improvement in tool accuracy: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comparison&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deltas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_call_accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# If improvement is significant, approve the Fleet deployment
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;comparison&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deltas&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;correctness&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;gt;10% improvement
&lt;/span&gt;    &lt;span class="n"&gt;fleet_deployment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;approve_deployment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;approved_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engine-verification-pipeline&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;traffic_percentage&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;  &lt;span class="c1"&gt;# Roll out fully
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Deployed to production: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fleet_deployment&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost considerations are important: Engine itself consumes tokens for its investigations. In the configuration above, we capped investigations at 50,000 tokens each. For teams running frequent investigations, budgeting $50-200/month for Engine's own LLM costs is typical. The ROI calculation centers on engineer time saved—if Engine saves 10 hours of debugging per month at $100/hour effective cost, the investment pays back quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;Engine makes the most sense for teams with specific operational characteristics. If you're running more than 1,000 daily agent runs and seeing failure rates above 5%, Engine's automated investigation capabilities provide clear time savings. Below those thresholds, the overhead of setting up and maintaining Engine may exceed the manual debugging time it saves.&lt;/p&gt;

&lt;p&gt;The organizational workflow that emerges treats Engine as a "first responder" for agent incidents. When an anomaly triggers, Engine investigates immediately—often completing diagnosis before a human even notices the alert. The human engineer's role shifts from "figure out what happened" to "evaluate Engine's analysis and decide whether to approve the suggested fix." This is a fundamental change in the debugging workflow that requires some adjustment in team processes and expectations.&lt;/p&gt;

&lt;p&gt;For teams already using alerting tools, Engine integrates cleanly. Engine investigation reports can be formatted as structured payloads for PagerDuty, Slack, or email notifications. A typical integration sends a summary with confidence scores immediately upon investigation completion, with links to the full report in LangSmith. High-confidence suggestions might trigger different notification channels than low-confidence ones that require more human analysis.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;competitive landscape&lt;/a&gt; for agent observability is heating up. AgentOps, Helicone, and other tools provide trace visualization and basic alerting. Engine differentiates through its agentic investigation approach—it doesn't just show you what happened, it reasons about why and proposes what to do. However, Engine currently only works with LangSmith traces, creating lock-in for teams considering multi-provider observability strategies.&lt;/p&gt;

&lt;p&gt;Looking at &lt;a href="https://www.langchain.com/blog" rel="noopener noreferrer"&gt;Harrison Chase's comments&lt;/a&gt; during Interrupt, future Engine capabilities will likely include automated rollback recommendations (when Engine detects that a recent deployment caused regression) and cross-agent pattern learning (identifying issues that affect multiple agents in your portfolio and suggesting portfolio-wide fixes). These capabilities would further reduce the human involvement needed in routine agent maintenance.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://machinelearningmastery.com/7-agentic-ai-trends-to-watch-in-2026" rel="noopener noreferrer"&gt;broader trends in agentic AI&lt;/a&gt; suggest that meta-agent patterns like Engine will proliferate. As agent systems become more complex, the meta-level work of monitoring, debugging, and improving those systems will increasingly benefit from agentic approaches. Engine is an early instantiation of this pattern, but expect competitors and alternatives to emerge rapidly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Build an Engine-monitored canary agent.&lt;/strong&gt; Take your most failure-prone production agent and set up Engine monitoring with aggressive thresholds (2% failure rate trigger, 1.5x latency multiplier). Run it for one week and review every investigation Engine produces. Your goal isn't to deploy any fixes yet—it's to calibrate your understanding of how Engine reasons about your specific agent's failure modes.&lt;/p&gt;

&lt;p&gt;Document each investigation: Was Engine's root cause analysis accurate? Were the suggested fixes applicable? Where did Engine miss important context? This calibration exercise will teach you where Engine excels (systematic issues with clear trace signatures) and where it struggles (business logic errors that require domain knowledge). You'll emerge with a clear sense of which agent problems to route to Engine versus escalate directly to human engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://interrupt.langchain.com" rel="noopener noreferrer"&gt;The Agent Conference by LangChain | Interrupt 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pub.towardsai.net/a-developers-guide-to-agentic-frameworks-in-2026-3f22a492dc3d" rel="noopener noreferrer"&gt;A Developer's Guide to Agentic Frameworks in 2026 - Towards AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/langchain-ai/langgraph" rel="noopener noreferrer"&gt;langchain-ai/langgraph: Build resilient agents. - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/langgraph-multi-agent-workflows" rel="noopener noreferrer"&gt;LangGraph: Multi-Agent Workflows - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/nirdiamant/genai_agents" rel="noopener noreferrer"&gt;NirDiamant/GenAI_Agents: 50+ tutorials and implementations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/TsinghuaC3I/Awesome-Memory-for-Agents" rel="noopener noreferrer"&gt;TsinghuaC3I/Awesome-Memory-for-Agents - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16646v1" rel="noopener noreferrer"&gt;Agentic Frameworks for Reasoning Tasks: An Empirical Study - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;The Path Ahead for Agentic AI: Challenges and Opportunities - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;AI Agent Frameworks Comparison 2026: Complete Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://machinelearningmastery.com/7-agentic-ai-trends-to-watch-in-2026" rel="noopener noreferrer"&gt;7 Agentic AI Trends to Watch in 2026 - MachineLearningMastery.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.langchain.com/blog" rel="noopener noreferrer"&gt;LangChain Blog&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly: The Tokenpocalypse Hits, Agentic Systems Mature, and Security Takes Center Stage</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:05:20 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-the-tokenpocalypse-hits-agentic-systems-mature-and-security-takes-center-stage-6l1</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-the-tokenpocalypse-hits-agentic-systems-mature-and-security-takes-center-stage-6l1</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly: The Tokenpocalypse Hits, Agentic Systems Mature, and Security Takes Center Stage
&lt;/h1&gt;

&lt;p&gt;The AI industry's "move fast and worry about costs later" era is officially over. This week brought a stark reckoning as enterprises discovered that unlimited AI access doesn't scale, while simultaneously the agentic programming paradigm crossed critical capability thresholds that make these tools harder than ever to abandon. The tension between transformative productivity gains and unsustainable infrastructure economics is now the defining challenge of enterprise AI adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Tokenpocalypse" Arrives: Enterprises Scramble as AI Costs Spiral
&lt;/h2&gt;

&lt;p&gt;The bill for enterprise AI enthusiasm is coming due. &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;TechCrunch reports&lt;/a&gt; on what insiders are calling the "tokenpocalypse"—a widespread scramble across Fortune 500 companies to contain AI inference costs that have blown past even aggressive projections.&lt;/p&gt;

&lt;p&gt;Uber provides the most striking example: the company reportedly exhausted its entire annual employee AI spending budget in just four months, forcing leadership to implement hard caps on individual usage. The culprit isn't frivolous prompts—it's the multiplicative effect of thousands of employees using AI assistants for routine tasks, each interaction consuming tokens that add up to staggering monthly invoices.&lt;/p&gt;

&lt;p&gt;The pattern repeats across industries. Financial services firms report inference costs 3-4x initial estimates. Healthcare organizations are renegotiating API contracts mid-year. Even AI-native startups are implementing usage monitoring dashboards that would have seemed paranoid twelve months ago.&lt;/p&gt;

&lt;p&gt;What makes this particularly thorny is the asymmetry between costs and benefits. The productivity gains are real—many organizations report genuine efficiency improvements—but token economics create a usage-punishing model where success breeds expense. The more valuable AI proves, the more employees use it, and the faster budgets evaporate.&lt;/p&gt;

&lt;p&gt;Expect a wave of cost optimization tooling, smarter routing between model tiers, and some uncomfortable conversations about which use cases justify frontier model pricing versus smaller, cheaper alternatives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The capability gap between agentic AI systems and human researchers is narrowing faster than most predictions anticipated. &lt;a href="https://www.anthropic.com/institute/recursive-self-improvement" rel="noopener noreferrer"&gt;Anthropic reports&lt;/a&gt; that Claude's open-ended task success rate reached 76% in May 2026—a remarkable 50 percentage point improvement in just six months. The benchmark measures completion of complex, multi-step tasks without human intervention, making this one of the most meaningful metrics for real-world agent deployment.&lt;/p&gt;

&lt;p&gt;Perhaps more striking is the weak-to-strong supervision experiment: Claude agents recovered 97% of the performance gap between weak and strong oversight, compared to just 23% achieved by human researchers working on the same problem. The compute bill—approximately $18,000 over 800 hours—represents a fraction of equivalent human labor costs, fundamentally changing the economics of research automation.&lt;/p&gt;

&lt;p&gt;Production architectures are converging on &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;multi-agent orchestration patterns&lt;/a&gt;, with orchestrator agents coordinating specialized sub-agents that maintain dedicated context windows. This allows complex workflows to exceed individual context limits while preserving coherent task execution. The &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;framework landscape&lt;/a&gt; is stabilizing around LangGraph, CrewAI, &lt;a href="https://openai.com/index/new-tools-for-building-agents" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt;, and Microsoft Agent Framework, all now shipping span-aware observability layers for debugging multi-agent interactions.&lt;/p&gt;

&lt;p&gt;Meanwhile, Genkit's new middleware system offers composable hooks for retries, model fallbacks, and tool approval gates—the kind of production-hardening infrastructure that signals agentic systems moving from experimental to enterprise-critical.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Ships Lockdown Mode to Combat Prompt Injection
&lt;/h2&gt;

&lt;p&gt;OpenAI &lt;a href="https://openai.com/index/new-tools-for-building-agents" rel="noopener noreferrer"&gt;launched Lockdown Mode&lt;/a&gt;, a new security feature designed to protect enterprise deployments from prompt injection attacks. The feature creates isolation boundaries between system instructions and user inputs, preventing malicious prompts from extracting sensitive data or hijacking agent behavior.&lt;/p&gt;

&lt;p&gt;The timing is deliberate. As AI agents gain broader system access—executing code, querying databases, managing credentials—the attack surface for prompt injection expands exponentially. A successful injection against a customer service bot is inconvenient; against an agent with API keys and database write access, it's catastrophic.&lt;/p&gt;

&lt;p&gt;Lockdown Mode implements several defensive layers: instruction compartmentalization, output filtering for sensitive patterns, and anomaly detection for unusual agent behavior sequences. It's opt-in for now, but OpenAI is clearly positioning security architecture as a first-class concern rather than an afterthought.&lt;/p&gt;

&lt;p&gt;The company also confirmed that development continues on its &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;"super app" initiative&lt;/a&gt;, which would consolidate ChatGPT, image generation, and agentic capabilities into a unified consumer platform—a direct response to the fragmented experience currently spread across multiple interfaces.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Launches Scout: OpenClaw-Inspired Personal Assistant
&lt;/h2&gt;

&lt;p&gt;Microsoft &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;debuted Scout&lt;/a&gt;, a new personal assistant that draws architectural inspiration from the open-source OpenClaw framework. The assistant emphasizes persistent context across sessions, proactive task suggestion, and tight integration with Microsoft 365 services.&lt;/p&gt;

&lt;p&gt;Scout represents an interesting pattern: major labs increasingly building production systems on paradigms first developed in community-driven projects. OpenClaw's contribution—a modular agent architecture allowing swappable reasoning and memory components—has been refined and scaled to Microsoft's infrastructure requirements.&lt;/p&gt;

&lt;p&gt;The positioning is clearly competitive with ChatGPT's memory features and Claude's project-based context management. Microsoft is betting that operating system-level integration and enterprise identity management will differentiate Scout in environments where standalone chat interfaces feel disconnected from actual workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic's Pre-IPO Positioning: Daniela Amodei Addresses AI Returns Skepticism
&lt;/h2&gt;

&lt;p&gt;With an IPO reportedly on the horizon, Anthropic is getting ahead of investor skepticism about AI returns. In recent public remarks, Daniela Amodei shared internal productivity data showing the median Anthropic employee reports approximately 4x output improvement using Mythos Preview for their workflows.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report&lt;/a&gt; provides external validation: engineers using agentic coding tools report decreased time-per-task but significantly larger increases in total output volume. The nuance matters—AI doesn't just make existing work faster; it makes previously impractical workloads feasible.&lt;/p&gt;

&lt;p&gt;TELUS offers a concrete case study: their teams shipped code 30% faster, saving over 500,000 hours—roughly 40 minutes saved per AI interaction. At enterprise scale, those minutes compound into strategic advantage.&lt;/p&gt;

&lt;p&gt;The productivity narrative is essential for Anthropic's valuation story, but it also reflects a genuine phase transition in AI deployment. The question is no longer whether AI tools improve individual productivity, but whether organizations can capture those gains at scale without the cost spiral hitting other enterprises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hackers Exploit Meta AI Support Chatbot to Hijack Instagram Accounts
&lt;/h2&gt;

&lt;p&gt;A social engineering attack &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;exploited Meta's AI-powered support system&lt;/a&gt; to gain unauthorized access to Instagram accounts, highlighting security risks as AI chatbots handle increasingly sensitive authentication workflows.&lt;/p&gt;

&lt;p&gt;The attack vector was clever: users were directed to what appeared to be a legitimate support flow, where the AI assistant was manipulated into initiating account recovery processes without proper verification. The chatbot, trained to be helpful and resolve user issues, became an unwitting accomplice in credential theft.&lt;/p&gt;

&lt;p&gt;The incident raises uncomfortable questions about AI system permissions in customer service contexts. When chatbots can trigger password resets, modify account settings, or escalate to privileged operations, they become high-value targets for social engineering. Traditional security models assumed human operators would catch suspicious patterns; AI systems require different safeguards.&lt;/p&gt;

&lt;p&gt;Meta has patched the specific vulnerability, but the broader architectural challenge remains: balancing AI helpfulness with security requires rethinking how much authority automated systems should have over identity-critical operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  WWDC 2026 Preview: Apple's Siri Overhaul and Apple Intelligence Updates
&lt;/h2&gt;

&lt;p&gt;Apple's WWDC kicks off tomorrow, and &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;all indications point&lt;/a&gt; to the most significant Siri overhaul in the assistant's history. Leaked developer documentation suggests deeper integration with Apple Intelligence, expanded on-device processing capabilities, and—finally—conversational context that persists across sessions.&lt;/p&gt;

&lt;p&gt;The pressure is real. ChatGPT, Claude, and Gemini have established consumer expectations for AI assistants that Siri cannot currently meet. Apple's privacy-first approach, while differentiated, has also meant slower feature deployment compared to cloud-native competitors.&lt;/p&gt;

&lt;p&gt;Expect announcements around improved natural language understanding, more sophisticated task chaining, and tighter integration with third-party apps through enhanced Shortcuts capabilities. The developer story matters too: Apple needs to give iOS developers compelling reasons to build agent-native experiences rather than simply wrapping ChatGPT APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  AirTrunk Commits $30B for 5GW AI Data Centers in India
&lt;/h2&gt;

&lt;p&gt;AirTrunk &lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;announced a $30 billion investment&lt;/a&gt; to build 5 gigawatts of AI-focused data center capacity across India, marking one of the largest single infrastructure commitments in the current AI buildout cycle.&lt;/p&gt;

&lt;p&gt;The scale is staggering—5GW could power roughly 4 million homes—and reflects the voracious power requirements of both training runs and, increasingly, inference at scale. The India location offers advantages in land availability, cooling efficiency in certain regions, and access to technical talent for operations.&lt;/p&gt;

&lt;p&gt;This investment joins a global race for AI compute infrastructure, with hyperscalers and specialized operators locked in competition for power purchase agreements, cooling technology, and the specialized construction expertise required for high-density deployments. The physical layer of AI—often overlooked in discussions of algorithms and architectures—has become a strategic bottleneck.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The cost management crisis hitting enterprises this week will force rapid innovation in inference optimization, model routing, and usage governance—expect a wave of startups and tools addressing this gap in the coming months. Meanwhile, the security incidents at Meta and OpenAI's Lockdown Mode response signal that agentic security is moving from theoretical concern to operational priority. Apple's WWDC announcements tomorrow will reveal whether the company can close the consumer AI gap or if the Siri overhaul is too little, too late.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/institute/recursive-self-improvement" rel="noopener noreferrer"&gt;When AI builds itself | Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com" rel="noopener noreferrer"&gt;TechCrunch | Startup and Technology News&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;Multi-Agent AI Systems 2026: Frameworks Compared - Future AGI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;Top 13 Agentic AI Trends to Watch in 2026 - Firecrawl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report - Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://openai.com/index/new-tools-for-building-agents" rel="noopener noreferrer"&gt;New tools for building agents | OpenAI&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>LangChain 1.0: The Complexity Tax Verdict</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 01 Jun 2026 12:03:34 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/langchain-10-the-complexity-tax-verdict-504i</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/langchain-10-the-complexity-tax-verdict-504i</guid>
      <description>&lt;h1&gt;
  
  
  LangChain 1.0: The Complexity Tax Verdict
&lt;/h1&gt;

&lt;p&gt;The framework wars of 2024-2025 asked one question repeatedly: is LangChain's abstraction layer worth the cognitive overhead? With the &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;1.0 stable release&lt;/a&gt; now shipping, we finally have an answer — but it's not the binary verdict most teams wanted. LangChain 1.0 is a much better version of what came before, not a fundamentally different framework, and understanding that distinction determines whether migration or adoption makes sense for your specific workload.&lt;/p&gt;

&lt;p&gt;The timing matters. We're watching the agentic AI landscape consolidate rapidly, with &lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;Alice Labs' production analysis&lt;/a&gt; ranking LangGraph first for complex stateful workflows across their 18+ deployments — but also noting that alternatives like Claude Agent SDK, CrewAI, and Pydantic AI have closed the gap significantly. The complexity tax question isn't academic anymore; it's a quarterly planning decision that affects team velocity, operational costs, and system maintainability.&lt;/p&gt;

&lt;p&gt;This deep-dive evaluates LangChain 1.0 against its own promises and its competitors' capabilities. We'll walk through the agent protocol standardization, the LangGraph runtime architecture, and a production-ready code implementation — then map these capabilities against the decision matrix you'll actually use when choosing frameworks. The goal isn't advocacy; it's giving you the technical clarity to make the right call for your specific constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Protocol: What Actually Shipped in 1.0
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://www.digitalapplied.com/blog/langchain-1-deep-dive-agent-protocol-runtime-2026" rel="noopener noreferrer"&gt;agent protocol standardization&lt;/a&gt; in LangChain 1.0 represents the most significant breaking change from the 0.x era — and the primary reason the migration is worth considering. The unified interface for agent instantiation, tool binding, and message handling now works consistently across both base LangChain and LangGraph, eliminating the cognitive overhead of remembering which API surface applied to which context.&lt;/p&gt;

&lt;p&gt;Tool binding consolidation delivers the most visible improvement. The &lt;code&gt;@tool&lt;/code&gt; decorator pattern now generates JSON schemas automatically from Python type hints, deprecating the legacy &lt;code&gt;Tool&lt;/code&gt; class constructors that required manual schema definition. This isn't just convenience — it eliminates a category of runtime errors where schema mismatches caused silent failures in production. The &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering report&lt;/a&gt; notes that tool schema errors were among the top three debugging pain points in 2025 production deployments.&lt;/p&gt;

&lt;p&gt;The Runnable protocol stability finally gives teams a canonical API to learn once and apply everywhere. &lt;code&gt;invoke()&lt;/code&gt;, &lt;code&gt;stream()&lt;/code&gt;, and &lt;code&gt;batch()&lt;/code&gt; are the three methods — that's it. The 0.x-era &lt;code&gt;__call__&lt;/code&gt; overloads are gone, which breaks existing code but eliminates the confusion about which invocation pattern to use when. Native async support through &lt;code&gt;ainvoke()&lt;/code&gt; and &lt;code&gt;astream()&lt;/code&gt; now includes proper cancellation semantics; the 0.x implementation had documented race conditions in cleanup handlers that caused resource leaks in long-running deployments.&lt;/p&gt;

&lt;p&gt;The callback system overhaul deserves attention from teams building observability infrastructure. Typed callback handlers replace string-based event names, enabling IDE autocomplete and static analysis that catches integration errors at development time rather than production. The &lt;code&gt;ChatModel&lt;/code&gt; base class now includes a standardized &lt;code&gt;bind_tools()&lt;/code&gt; method signature that works identically across &lt;a href="https://medium.com/@atnoforgenai/10-ai-agent-frameworks-you-should-know-in-2026-langgraph-crewai-autogen-more-2e0be4055556" rel="noopener noreferrer"&gt;OpenAI, Anthropic, Google, and other providers&lt;/a&gt;, reducing the provider-specific knowledge required to switch models.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangGraph Runtime: The 1.0 Production Architecture
&lt;/h2&gt;

&lt;p&gt;LangGraph's runtime architecture in 1.0 reflects hard-won lessons from production deployments. The &lt;a href="https://www.digitalapplied.com/blog/langchain-1-deep-dive-agent-protocol-runtime-2026" rel="noopener noreferrer"&gt;StateGraph initialization&lt;/a&gt; now requires an explicit &lt;code&gt;state_schema&lt;/code&gt; parameter — a breaking change that emerged from LangGraph 2.0 and carries through to the unified release. This mandatory typing catches state shape mismatches at graph construction time rather than during execution, which matters enormously when debugging distributed systems.&lt;/p&gt;

&lt;p&gt;The checkpointer interface has reached stability with &lt;code&gt;PostgresSaver&lt;/code&gt;, &lt;code&gt;SqliteSaver&lt;/code&gt;, and &lt;code&gt;MemorySaver&lt;/code&gt; sharing identical APIs. Connection pooling is enabled by default, addressing the connection exhaustion issues that plagued early production deployments. The practical implication: you can develop locally with &lt;code&gt;SqliteSaver&lt;/code&gt;, run integration tests with &lt;code&gt;MemorySaver&lt;/code&gt;, and deploy to production with &lt;code&gt;PostgresSaver&lt;/code&gt; without changing node implementation code.&lt;/p&gt;

&lt;p&gt;Edge routing formalization represents a subtle but powerful improvement. The &lt;code&gt;add_conditional_edges()&lt;/code&gt; method now accepts typed routing functions that return &lt;code&gt;Literal&lt;/code&gt; types, enabling compile-time validation of routing logic. Combined with the &lt;a href="https://www.digitalapplied.com/blog/langchain-1-deep-dive-agent-protocol-runtime-2026" rel="noopener noreferrer"&gt;graph validation&lt;/a&gt; that &lt;code&gt;graph.compile()&lt;/code&gt; performs — including reachability analysis and orphan node detection — teams can catch structural errors before deployment rather than discovering them through runtime failures.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;interrupt()&lt;/code&gt; API for &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;human-in-the-loop workflows&lt;/a&gt; is now the canonical pattern, replacing the ad-hoc state mutation approaches that characterized early LangGraph implementations. This matters for compliance-sensitive deployments where human approval gates are mandatory. The interrupt mechanism integrates cleanly with checkpointing, allowing workflows to pause indefinitely without losing state.&lt;/p&gt;

&lt;p&gt;Node lifecycle hooks (&lt;code&gt;on_enter&lt;/code&gt;, &lt;code&gt;on_exit&lt;/code&gt;) address resource management in long-running graphs. Database connections, API clients, and file handles can be properly cleaned up even when nodes fail mid-execution. This isn't glamorous functionality, but it's the difference between graphs that work in demos and graphs that survive production traffic patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;The following implementation demonstrates LangChain 1.0's canonical patterns for a production-ready research agent. This agent searches the web, retrieves documents, and synthesizes findings — a common pattern that exercises tool binding, conditional routing, checkpointing, and observability integration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# langchain_research_agent.py
# Requires: langchain-core&amp;gt;=1.0.0, langchain-openai&amp;gt;=1.0.0, langgraph&amp;gt;=2.0.0
# pip install langchain-core langchain-openai langgraph psycopg2-binary
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.postgres&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PostgresSaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define typed state schema - now mandatory in 1.0
# The Annotated pattern with operator.add enables message accumulation
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Accumulates across nodes
&lt;/span&gt;    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieved document content
&lt;/span&gt;    &lt;span class="n"&gt;iteration_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;  &lt;span class="c1"&gt;# Guard against infinite loops
&lt;/span&gt;    &lt;span class="n"&gt;search_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Track what we've searched
&lt;/span&gt;
&lt;span class="c1"&gt;# 2. Tool definitions using the @tool decorator
# Schema generation is automatic from type hints - no manual JSON schema required
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information on a topic.

    Args:
        query: The search query string to look up

    Returns:
        Summarized search results as a string
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Production: Replace with actual search API (Tavily, SerpAPI, etc.)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search results for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;: [Simulated web content about &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve documents from the knowledge base on a specific topic.

    Args:
        topic: The topic to retrieve documents about
        max_docs: Maximum number of documents to return

    Returns:
        List of relevant document contents
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Production: Replace with vector store retrieval
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Document &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; about &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_docs&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Initialize the model with tool binding - standardized in 1.0
# bind_tools() works identically across OpenAI, Anthropic, Google providers
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;web_search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieve_documents&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;model_with_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Node implementations with structured error handling
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main research node - decides whether to search, retrieve, or synthesize.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Guard against runaway iterations - critical for production
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maximum iterations reached. Synthesizing available information.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_with_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Structured error handling with state-based recovery
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research step failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Attempting recovery...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Synthesize findings from collected documents and search results.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;synthesis_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on the following research materials, provide a comprehensive synthesis:

Documents collected: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;chr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  # Limit context window usage

Provide a well-structured summary addressing the original query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Routing function with Literal return type for compile-time validation
# This pattern enables static analysis and IDE support
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Route based on the last message - determines next step in the workflow.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check for tool calls in the response
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Check iteration count for forced synthesis
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Check for completion signals in content
&lt;/span&gt;    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SYNTHESIS COMPLETE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 6. Build the graph with explicit schema - the 1.0 pattern
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ResearchState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add nodes
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# Built-in tool execution node
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set entry point
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add conditional edges with typed routing
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;route_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Tools always return to research for next decision
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 7. Compile with production checkpointing
# PostgresSaver with connection pooling for production workloads
&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PostgresSaver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_conn_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://user:pass@localhost:5432/langchain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pool_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Connection pool for concurrent requests
&lt;/span&gt;    &lt;span class="n"&gt;max_overflow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;  &lt;span class="c1"&gt;# Allow burst capacity
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compile performs reachability analysis and validates graph structure
&lt;/span&gt;&lt;span class="n"&gt;compiled_graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 8. Usage with LangSmith tracing integration
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute a research workflow with full observability.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tracers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LangChainTracer&lt;/span&gt;

    &lt;span class="n"&gt;initial_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;iteration_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_queries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Configure tracing and thread persistence
&lt;/span&gt;    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;LangChainTracer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Stream execution for real-time progress
&lt;/span&gt;    &lt;span class="n"&gt;final_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;compiled_graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Step: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_state&lt;/span&gt;

&lt;span class="c1"&gt;# Example invocation
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the key architectural patterns for production AI agents in 2026?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-session-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This implementation demonstrates several 1.0-specific patterns worth noting. The &lt;code&gt;TypedDict&lt;/code&gt; state schema with &lt;code&gt;Annotated&lt;/code&gt; fields enables automatic message accumulation — a common source of bugs in 0.x implementations where developers manually managed list concatenation. The &lt;code&gt;Literal&lt;/code&gt; return type on the routing function allows &lt;code&gt;graph.compile()&lt;/code&gt; to validate that all routing outcomes have corresponding edges defined. The checkpointer configuration shows production-appropriate connection pooling, and the tracing integration demonstrates the LangSmith observability pattern that's now built into the framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration Path: 0.x to 1.0 Breaking Changes
&lt;/h2&gt;

&lt;p&gt;Migration from LangChain 0.x to 1.0 requires systematic changes across several dimensions. The &lt;a href="https://www.digitalapplied.com/blog/langchain-1-deep-dive-agent-protocol-runtime-2026" rel="noopener noreferrer"&gt;import reorganization&lt;/a&gt; is the most visible: &lt;code&gt;from langchain.chat_models&lt;/code&gt; becomes &lt;code&gt;from langchain_openai&lt;/code&gt; (or the appropriate provider-specific package). This isn't just renaming — it reflects the architectural decision to separate the core framework from provider implementations, enabling independent versioning and faster provider-specific updates.&lt;/p&gt;

&lt;p&gt;The deprecation of &lt;code&gt;ConversationChain&lt;/code&gt; and &lt;code&gt;LLMChain&lt;/code&gt; represents a philosophical shift. These high-level abstractions hid too much complexity, making debugging difficult when behavior didn't match expectations. The 1.0 pattern favors explicit composition: &lt;code&gt;ChatModel | PromptTemplate | OutputParser&lt;/code&gt; as distinct, inspectable components. Teams with extensive &lt;code&gt;LLMChain&lt;/code&gt; usage should budget time for refactoring, but the resulting code is more maintainable.&lt;/p&gt;

&lt;p&gt;Memory class removal (&lt;code&gt;ConversationBufferMemory&lt;/code&gt;, &lt;code&gt;ConversationSummaryMemory&lt;/code&gt;, etc.) is the most significant breaking change for chat applications. The &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;1.0 architecture&lt;/a&gt; expects memory to live in LangGraph state or external storage you manage directly. This eliminates the "magic" behavior that caused confusion about where state actually resided, but requires explicit state management code.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Agent&lt;/code&gt; and &lt;code&gt;AgentExecutor&lt;/code&gt; classes are deprecated for new code. The replacement pattern uses &lt;code&gt;create_react_agent()&lt;/code&gt; which returns a compiled &lt;code&gt;StateGraph&lt;/code&gt; — unifying the mental model between simple agents and complex workflows. Existing &lt;code&gt;AgentExecutor&lt;/code&gt; code will continue to work but won't receive new features.&lt;/p&gt;

&lt;p&gt;Callback handler signatures changed from &lt;code&gt;on_llm_start(serialized, prompts, **kwargs)&lt;/code&gt; to &lt;code&gt;on_llm_start(run_id, messages, **kwargs)&lt;/code&gt;, reflecting the shift from prompt-centric to message-centric APIs. Custom callback handlers require updates, but the new signature is more useful for observability purposes since &lt;code&gt;run_id&lt;/code&gt; enables correlation across distributed traces.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;langchain-community&lt;/code&gt; package split means provider integrations require separate installations: &lt;code&gt;pip install langchain-anthropic&lt;/code&gt;, &lt;code&gt;pip install langchain-google-genai&lt;/code&gt;, etc. This adds installation complexity but reduces dependency bloat for applications using single providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  LangChain vs. Alternatives: The 2026 Decision Matrix
&lt;/h2&gt;

&lt;p&gt;The framework landscape has matured significantly, and the &lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;Alice Labs analysis&lt;/a&gt; provides useful data for comparison. LangGraph maintains the top ranking for complex stateful workflows, but the decision factors are more nuanced than simple rankings suggest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Against Claude Agent SDK&lt;/strong&gt;: Anthropic's native offering provides a simpler API surface and tighter Claude integration, but locks you to a single provider. Choose LangChain when multi-provider flexibility matters — switching models mid-project or running A/B tests across providers becomes trivial with the standardized &lt;code&gt;ChatModel&lt;/code&gt; interface. Choose Claude Agent SDK when you're committed to Claude and want minimal abstraction overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Against CrewAI&lt;/strong&gt;: The &lt;a href="https://github.com/crewaiinc/crewai" rel="noopener noreferrer"&gt;role-based multi-agent abstraction&lt;/a&gt; in CrewAI offers faster initial development for team-of-agents patterns, but the higher-level abstraction limits customization. Choose LangChain when you need fine-grained state control or non-standard agent coordination patterns. The &lt;a href="https://arxiv.org/html/2605.10052v2" rel="noopener noreferrer"&gt;Swarm Skills paper&lt;/a&gt; demonstrates that CrewAI-to-AutoGen translation requires adapter layers, suggesting interoperability challenges when outgrowing the framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Against Pydantic AI&lt;/strong&gt;: For type-safe Python with minimal abstraction, &lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;Pydantic AI&lt;/a&gt; offers excellent developer experience. Choose LangChain when workflow complexity exceeds single-agent patterns — Pydantic AI excels at tool-using chat but doesn't provide the graph execution semantics needed for multi-step coordination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Against Microsoft Semantic Kernel&lt;/strong&gt;: The enterprise-native option for &lt;a href="https://github.com/microsoft/autogen" rel="noopener noreferrer"&gt;.NET-first teams&lt;/a&gt;, Semantic Kernel provides deeper Azure integration. Choose LangChain for Python-first teams without .NET requirements. Note that &lt;a href="https://github.com/microsoft/autogen/discussions/7144" rel="noopener noreferrer"&gt;AutoGen's shared state handling&lt;/a&gt; across multi-agent conversations remains a documented challenge.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;decision heuristic from Alice Labs&lt;/a&gt; provides a useful starting point: "Start from your dominant constraint: control (LangGraph), team velocity (CrewAI), type safety (Pydantic AI)." This framingcorrectly identifies that framework selection should derive from constraints, not feature lists.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you're already on LangChain 0.x&lt;/strong&gt;: The migration is worth the investment. The stability guarantees, consolidated APIs, and improved debugging experience reduce ongoing maintenance burden. Budget 2-4 weeks for a medium-sized codebase, with the primary effort going toward memory class replacement and import reorganization. The &lt;a href="https://www.langchain.com/blog/january-2026-langchain-newsletter" rel="noopener noreferrer"&gt;January 2026 newsletter&lt;/a&gt; includes migration tooling that automates some import updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're evaluating frameworks fresh&lt;/strong&gt;: LangChain 1.0 is the right choice specifically for workflows requiring durable state, conditional branching, and multi-step agent coordination. It's not the right choice for simple single-turn chat or prototype applications where iteration speed matters more than production robustness. The &lt;a href="https://medium.com/@dewasheesh.rana/agentic-ai-design-patterns-2026-ed-e3a5125162c5" rel="noopener noreferrer"&gt;agentic AI design patterns&lt;/a&gt; emerging in 2026 map well to LangGraph's graph-based model, suggesting long-term alignment with industry direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangSmith coupling consideration&lt;/strong&gt;: The &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;integrated evaluation framework&lt;/a&gt; provides powerful capabilities — automated regression testing, prompt versioning, cost tracking — but creates platform dependency. If your organization requires portable observability through OpenTelemetry or vendor-neutral tracing, evaluate whether LangSmith's benefits justify the lock-in. The callback system does support custom tracers, but LangSmith-specific features won't translate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost awareness&lt;/strong&gt;: LangChain's abstraction layers add token overhead through system prompts and tool schemas. For &lt;a href="https://arxiv.org/pdf/2602.07359" rel="noopener noreferrer"&gt;high-volume workloads&lt;/a&gt;, measure actual token costs against direct API usage. The difference can be 15-25% depending on workflow complexity. This overhead buys development velocity and debugging capability, but the tradeoff should be conscious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team skill match&lt;/strong&gt;: LangGraph's graph-based mental model requires upfront learning investment. Teams without prior experience with state machines, workflow orchestration, or reactive systems may find &lt;a href="https://github.com/crewaiinc/crewai" rel="noopener noreferrer"&gt;CrewAI's declarative approach&lt;/a&gt; faster to adopt initially. However, the graph model provides better long-term maintainability for complex systems — it's a question of where you want to spend the learning time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production readiness checklist&lt;/strong&gt;: Before deploying LangChain 1.0 agents to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable checkpointing — never run stateful graphs without persistence&lt;/li&gt;
&lt;li&gt;Configure connection pooling for database checkpointers (10-20 connections typical)&lt;/li&gt;
&lt;li&gt;Set up LangSmith tracing or equivalent observability before deployment&lt;/li&gt;
&lt;li&gt;Implement node-level timeouts to prevent runaway executions&lt;/li&gt;
&lt;li&gt;Add iteration guards in routing logic to catch infinite loops&lt;/li&gt;
&lt;li&gt;Test interrupt/resume flows if human-in-the-loop is required&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Document QA Agent with Citation Tracking&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a research agent that answers questions about a document corpus while maintaining explicit citation chains. This exercises the 1.0 patterns — typed state with document references, conditional routing between retrieval and synthesis, checkpointing for long-running analysis sessions — while solving a practical problem: knowing exactly which documents supported which claims.&lt;/p&gt;

&lt;p&gt;The state schema should include &lt;code&gt;citations: list[Citation]&lt;/code&gt; where &lt;code&gt;Citation&lt;/code&gt; is a &lt;code&gt;TypedDict&lt;/code&gt; with &lt;code&gt;document_id&lt;/code&gt;, &lt;code&gt;chunk_text&lt;/code&gt;, and &lt;code&gt;relevance_score&lt;/code&gt; fields. Your routing logic should decide between "retrieve more documents", "validate existing citations", and "generate final answer with citations". The synthesis node should produce output that includes inline source references mapping to the citation state.&lt;/p&gt;

&lt;p&gt;Deploy with PostgresSaver checkpointing and LangSmith tracing, then test resumption: kill the process mid-execution, restart, and verify the agent continues from its last checkpoint without re-retrieving documents. This resumption capability is what separates demo code from production systems, and LangChain 1.0 makes it straightforward to implement correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.digitalapplied.com/blog/langchain-1-deep-dive-agent-protocol-runtime-2026" rel="noopener noreferrer"&gt;LangChain 1 Deep Dive: Agent Protocol + Runtime 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;March 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;AI Agent Frameworks 2026: Production-Tested Ranking - Alice Labs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/january-2026-langchain-newsletter" rel="noopener noreferrer"&gt;January 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@atnoforgenai/10-ai-agent-frameworks-you-should-know-in-2026-langgraph-crewai-autogen-more-2e0be4055556" rel="noopener noreferrer"&gt;10 AI Agent Frameworks You Should Know in 2026: LangGraph ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/autogen" rel="noopener noreferrer"&gt;GitHub - microsoft/autogen: A programming framework for agentic AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/crewaiinc/crewai" rel="noopener noreferrer"&gt;GitHub - crewAIInc/crewAI: Framework for orchestrating role-playing ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2605.10052v2" rel="noopener noreferrer"&gt;Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/autogen/discussions/7144" rel="noopener noreferrer"&gt;Handling shared state across multi-agent conversations in AutoGen · Discussion #7144&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;AI Agent Frameworks Comparison 2026: Complete Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2602.07359" rel="noopener noreferrer"&gt;W&amp;amp;D: Scaling Parallel Tool Calling for Efficient Deep Research Agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://medium.com/@dewasheesh.rana/agentic-ai-design-patterns-2026-ed-e3a5125162c5" rel="noopener noreferrer"&gt;Agentic AI Design Patterns (2026 Edition)&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly Digest: Memory Wars, Model Upgrades, and the Trading Benchmark That Humbled Five LLMs</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 01 Jun 2026 12:02:35 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-digest-memory-wars-model-upgrades-and-the-trading-benchmark-that-humbled-five-llms-2f66</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-digest-memory-wars-model-upgrades-and-the-trading-benchmark-that-humbled-five-llms-2f66</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly Digest: Memory Wars, Model Upgrades, and the Trading Benchmark That Humbled Five LLMs
&lt;/h1&gt;

&lt;p&gt;The week ending June 1, 2026 delivered a sharp reminder that raw compute isn't everything—and neither is language fluency. A $135M chip startup is betting AI's real constraint is memory, Anthropic shipped a model that actually catches its own coding mistakes, and a brutal new benchmark revealed that most frontier models can't beat the market even when they're confident they can. Meanwhile, the infrastructure buildout continues at staggering scale, and the backlash chorus is growing louder.&lt;/p&gt;




&lt;h2&gt;
  
  
  XCENA Raises $135M Betting AI's Real Bottleneck Is Memory, Not Compute
&lt;/h2&gt;

&lt;p&gt;Chip startup XCENA has &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;secured $135 million in funding&lt;/a&gt;, positioning itself against the dominant GPU-centric narrative that has made NVIDIA the undisputed king of AI infrastructure. The company's core thesis is provocative but increasingly resonant among systems architects: memory bandwidth and latency—not raw floating-point operations—are the true limiting factors for scaling AI workloads.&lt;/p&gt;

&lt;p&gt;The argument isn't new among researchers, but it's gaining commercial validation. Modern transformer inference spends enormous time waiting for weights to load from memory rather than actually computing. NVIDIA's H100 and H200 have addressed this partially with HBM3 and HBM3e, but XCENA claims their architecture delivers fundamentally different memory-compute ratios optimized specifically for inference rather than training.&lt;/p&gt;

&lt;p&gt;The implications for next-generation AI infrastructure are significant. If XCENA's bet pays off, we could see a bifurcation in the chip market: GPU clusters for training, memory-optimized silicon for serving. This would particularly benefit enterprises deploying large language models at scale, where inference costs dominate operational budgets. The $135M gives XCENA runway to tape out production chips, though they'll face the sobering reality that challenging NVIDIA's ecosystem moat requires more than better specs—it requires convincing hyperscalers to take a risk on unproven silicon.&lt;/p&gt;




&lt;h2&gt;
  
  
  Anthropic Releases Claude Opus 4.8 with Enhanced Code Self-Correction
&lt;/h2&gt;

&lt;p&gt;Anthropic &lt;a href="https://www.anthropic.com/news/claude-opus-4-8" rel="noopener noreferrer"&gt;released Claude Opus 4.8&lt;/a&gt; this week, with the headline improvement being a roughly 4x reduction in the rate at which the model lets code flaws pass unremarked compared to its predecessor, Opus 4.7. The model is available immediately via API as &lt;code&gt;claude-opus-4-8&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This matters because self-correction capability is arguably the single most important trait for autonomous coding agents. A model that confidently ships buggy code creates technical debt at machine speed; one that catches its own mistakes before commit becomes genuinely useful for unsupervised work. Anthropic's internal evaluations show improvements across syntax errors, logic bugs, and security vulnerabilities, though the company notes the gains are most pronounced in languages with strong type systems.&lt;/p&gt;

&lt;p&gt;Perhaps more intriguing is the &lt;a href="https://www.anthropic.com/news/claude-opus-4-8" rel="noopener noreferrer"&gt;Project Glasswing preview&lt;/a&gt;, which enables select organizations to use Claude Mythos—Anthropic's specialized security-focused model—for cybersecurity work including vulnerability assessment and threat modeling. Access is restricted and requires application, suggesting Anthropic is being cautious about dual-use concerns. The combination signals Anthropic's broader strategic push: making Claude not just capable but &lt;em&gt;reliable&lt;/em&gt; enough for high-stakes autonomous deployment where errors have real consequences.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;Microsoft Agent Framework&lt;/a&gt; is now officially positioned as the successor to AutoGen, consolidating async multi-agent patterns into a production-ready stack. The framework emphasizes typed message passing, structured agent lifecycles, and native Azure integration—Microsoft's clear bid to own enterprise agent infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Zijian-Ni/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;LlamaIndex shipped Google Agents API integration&lt;/a&gt; this week, including access to sandboxed Linux environments for agents that need to execute code safely. Alongside it, they released ParseBench, an OCR benchmark specifically designed for evaluating how well agents can extract structured data from documents—a capability that's increasingly critical for enterprise automation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/topics/best-ai-tools-2026" rel="noopener noreferrer"&gt;Genkit middleware system&lt;/a&gt; arrived with composable hooks for retries, model fallbacks, tool approval gates, and skill injection. This middleware pattern—borrowed from web frameworks—lets developers declaratively specify policies rather than scattering retry logic throughout agent code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;MCP Apps are emerging as a 2026 pattern&lt;/a&gt;: tools that return rich interactive UIs (dashboards, forms, visualizations) directly within agent chat interfaces. This collapses the distinction between "agent gives you information" and "agent gives you an app."&lt;/p&gt;

&lt;p&gt;Finally, &lt;a href="https://aimultiple.com/agentic-ai-trends" rel="noopener noreferrer"&gt;multi-agent orchestration is shifting from experimental to enterprise mainstream&lt;/a&gt;, with UiPath and IBM both publishing formal guidance on deploying agent swarms in production. The era of single-agent demos is definitively over.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitHub Copilot's New Token-Based Billing Sparks Developer Backlash
&lt;/h2&gt;

&lt;p&gt;GitHub's move to a &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;token-metered pricing model for Copilot&lt;/a&gt; has ignited significant developer frustration, with complaints centering on unpredictable costs and the cognitive overhead of monitoring usage. The shift away from flat monthly subscriptions—previously $10/month for individuals and $19/month for business—represents a fundamental change in how AI coding assistants are sold.&lt;/p&gt;

&lt;p&gt;The backlash is driven by practical concerns. Developers report that token consumption varies wildly based on coding style, project complexity, and how aggressively they use chat features versus inline completions. A heavy Copilot user might see bills 3-5x higher than the old flat rate, while occasional users could theoretically pay less. The uncertainty is the problem: engineers hate variable costs for tools they use continuously.&lt;/p&gt;

&lt;p&gt;Competing tools are positioning against the change. &lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Cursor, Continue, and Roo Code&lt;/a&gt; are all emphasizing their pricing models—some flat-rate, some with generous free tiers, some offering local-model options that eliminate API costs entirely. The strategic question for GitHub is whether enterprise procurement departments, who value predictable budgets, will push back hard enough to force a reversal. Microsoft has historically been flexible when enterprise customers revolt, but they also have revenue targets that flat subscriptions weren't meeting.&lt;/p&gt;




&lt;h2&gt;
  
  
  SoftBank Commits €75 Billion for French AI Data Center Infrastructure
&lt;/h2&gt;

&lt;p&gt;SoftBank &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;announced a €75 billion commitment&lt;/a&gt; to build AI data center infrastructure in France, part of a broader European AI buildout that's accelerating across the continent. The investment will span multiple facilities optimized for both training and inference workloads, with construction expected to begin in 2027.&lt;/p&gt;

&lt;p&gt;The deal follows a pattern of Big Tech infrastructure investments targeting an estimated &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;110 GW of power for AI workloads&lt;/a&gt; globally by 2030—roughly equivalent to adding another Germany to global electricity demand. Nuclear power agreements have become the preferred mechanism for securing clean baseload, with Microsoft, Google, and Amazon all signing deals in the past year.&lt;/p&gt;

&lt;p&gt;Environmental activist &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;Erin Brockovich has raised concerns&lt;/a&gt; about data center secrecy and environmental impact, particularly around water usage for cooling and the gap between companies' renewable energy claims and actual grid impact. France's relatively clean nuclear-heavy grid makes it attractive for AI workloads that need to claim low carbon intensity, but local communities are increasingly questioning whether they want these massive facilities in their regions. The €75 billion figure is eye-catching, but the real story is infrastructure: AI capability is increasingly constrained by physical buildout, not algorithmic progress.&lt;/p&gt;




&lt;h2&gt;
  
  
  PolyBench Reveals Only 2 of 7 Top LLMs Can Actually Make Money Trading
&lt;/h2&gt;

&lt;p&gt;A new benchmark called &lt;a href="https://arxiv.org/html/2604.14199v1" rel="noopener noreferrer"&gt;PolyBench&lt;/a&gt; has delivered a humbling result for large language models: when tested against live Polymarket prediction data spanning 38,666 markets, only two of seven state-of-the-art models actually made money. The rest lost despite expressing high confidence in their predictions.&lt;/p&gt;

&lt;p&gt;MiMo-V2-Flash achieved a 17.6% cumulative weighted return, while Gemini-3-Flash managed 6.2%. The remaining five models—including several frontier systems with strong performance on standard benchmarks—ended in the red. What makes this particularly striking is that the losing models often exhibited high stated confidence; they weren't uncertain, they were confidently wrong.&lt;/p&gt;

&lt;p&gt;The benchmark exposes a crucial gap between language fluency and genuine probabilistic reasoning under uncertainty. Prediction markets are adversarial environments where being calibrated matters more than being articulate. The PolyBench paper argues that most LLM evaluation frameworks test whether models can generate plausible text, not whether they can make accurate bets. This has direct implications for financial applications, autonomous agents that need to reason about uncertain outcomes, and any domain where overconfidence is costly. The results suggest we may need fundamentally different training approaches—or at minimum, different fine-tuning objectives—to produce models that know what they don't know.&lt;/p&gt;




&lt;h2&gt;
  
  
  Meta Reportedly Developing AI Pendant Wearable
&lt;/h2&gt;

&lt;p&gt;Meta is &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;exploring an AI-powered pendant device&lt;/a&gt;, according to reports this week, joining a heating wearable AI race that Google intensified with &lt;a href="https://www.wired.com/story/everything-google-announced-at-google-io-2026" rel="noopener noreferrer"&gt;Android smart glasses demos at I/O 2026&lt;/a&gt;. The pendant form factor—a microphone-equipped device worn around the neck or clipped to clothing—represents a different bet than glasses: less obtrusive, no camera concerns, but also less capable for visual AI features.&lt;/p&gt;

&lt;p&gt;The strategic logic for Meta is unclear given that Meta AI is already deeply integrated into WhatsApp, Messenger, and Instagram. A pendant would need to offer something those apps can't: always-listening ambient awareness, perhaps, or faster access than pulling out a phone. The privacy implications are immediately obvious, and Meta's brand isn't exactly associated with trust in that domain.&lt;/p&gt;

&lt;p&gt;Google's approach at I/O emphasized glasses with real-time translation, visual search, and navigation overlays—capabilities that genuinely require a camera. A pendant can transcribe and respond to voice but can't see. The question is whether voice-only ambient AI is compelling enough to wear a dedicated device, or whether AirPods and existing smartphone assistants already serve that need. The pendant category has seen multiple high-profile failures; Meta will need to explain what's different this time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pope Leo XIV Joins Growing Chorus Warning About AI Dangers
&lt;/h2&gt;

&lt;p&gt;The Vatican this week &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;issued formal warnings about artificial intelligence risks&lt;/a&gt;, with Pope Leo XIV joining university graduates and industry voices in what Reuters characterized as an &lt;a href="https://www.reuters.com/commentary/breakingviews/global-markets-breakingviews-2026-05-31" rel="noopener noreferrer"&gt;"AI backlash arrives"&lt;/a&gt; moment. The Vatican's statement emphasized concerns about human dignity, labor displacement, and autonomous systems making consequential decisions without meaningful human oversight.&lt;/p&gt;

&lt;p&gt;The timing is notable. While OpenAI's Sam Altman has &lt;a href="https://www.reuters.com/world/asia-pacific/openais-altman-says-ai-unlikely-lead-jobs-apocalypse-2026-05-26" rel="noopener noreferrer"&gt;maintained that AI is unlikely to lead to a "jobs apocalypse"&lt;/a&gt;, the accumulation of warnings from religious leaders, academics, and affected workers is creating political pressure that wasn't present even a year ago. Governance frameworks remain fragmented; the EU AI Act is still ramping up enforcement, and the US approach remains sector-specific and reactive.&lt;/p&gt;

&lt;p&gt;For practitioners, the most actionable concern is agent accountability. When an autonomous agent takes an action with real-world consequences—makes a trade, sends an email, files a document—who is responsible when it goes wrong? Current legal frameworks have no good answer. The Vatican's intervention won't change that directly, but it signals that the window for self-regulation by the industry is narrowing. Those building agent systems should be thinking about audit trails, human-in-the-loop checkpoints, and interpretable decision logs before regulators mandate them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;Next week brings the expected public preview of Microsoft Agent Framework as enterprises begin piloting multi-agent systems in production. The PolyBench results may accelerate research into calibration-focused training—watch for papers on that front at ICML. And the infrastructure story isn't slowing: SoftBank's €75 billion is just one of several massive deals in negotiation, with Japan and Saudi Arabia both reportedly in advanced talks for similar-scale investments.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wired.com/story/everything-google-announced-at-google-io-2026" rel="noopener noreferrer"&gt;Everything Announced at Google I/O 2026: Gemini, Search, Smart Glasses | WIRED&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;Artificial Intelligence - AI News - Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/commentary/breakingviews/global-markets-breakingviews-2026-05-31" rel="noopener noreferrer"&gt;The Week in Breakingviews: The AI backlash arrives - Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/world/asia-pacific/openais-altman-says-ai-unlikely-lead-jobs-apocalypse-2026-05-26" rel="noopener noreferrer"&gt;OpenAI's Altman says AI unlikely to lead to 'jobs apocalypse' | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;Multi-Agent AI Systems 2026: Frameworks Compared - Future AGI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Zijian-Ni/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;GitHub - Zijian-Ni/awesome-ai-agents-2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aimultiple.com/agentic-ai-trends" rel="noopener noreferrer"&gt;10+ Agentic AI Trends and Examples for 2026 - AIMultiple&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;Latest Agentic AI Trends to Watch in 2026 - Hugging Face&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/topics/best-ai-tools-2026" rel="noopener noreferrer"&gt;best-ai-tools-2026 · GitHub Topics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.14199v1" rel="noopener noreferrer"&gt;PolyBench: Benchmarking LLM Forecasting and Trading - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/claude-opus-4-8" rel="noopener noreferrer"&gt;Introducing Claude Opus 4.8 - Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Best AI Tools for Developers in 2026 - GitHub Community&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>Primitive Shifts: Workflow Persistence as a First-Class Primitive</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 01 Jun 2026 12:02:24 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-workflow-persistence-as-a-first-class-primitive-15c9</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-workflow-persistence-as-a-first-class-primitive-15c9</guid>
      <description>&lt;h1&gt;
  
  
  Primitive Shifts: Workflow Persistence as a First-Class Primitive
&lt;/h1&gt;

&lt;p&gt;Every few months, the baseline of how AI systems work quietly moves. Engineers who noticed early weren't smarter — they were just paying attention to the right signals. Last year it was tool-use standardization. The year before, it was context window management. This month, the shift is less visible but arguably more consequential: the execution trace of an agent is becoming the artifact, not the output it produces.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is It?
&lt;/h2&gt;

&lt;p&gt;Workflow persistence is the capability to capture, store, version, and replay complete agent execution traces — including tool calls, intermediate states, decision branches, and recovery checkpoints — as durable, portable artifacts. If that sounds like "just better logging," you're missing the architectural shift.&lt;/p&gt;

&lt;p&gt;The difference is categorical. Traditional agent systems treat execution as ephemeral: you prompt, the agent runs, you get output, the intermediate state evaporates. Workflow persistence inverts this. The agent doesn't just execute tasks — it produces a &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;reusable workflow definition&lt;/a&gt; that can be audited, forked, versioned, and re-executed against different inputs or different models.&lt;/p&gt;

&lt;p&gt;This mirrors a transition we've seen before: the shift from imperative scripts to declarative infrastructure-as-code. Except now it's agent-behavior-as-code, with the agent generating its own specification through execution. Your agent's decision to call a search tool, filter results, then invoke a code interpreter isn't just logged — it becomes a deployable object.&lt;/p&gt;

&lt;p&gt;The convergence is happening across multiple frameworks simultaneously. LangGraph 2.0's checkpoint-resume architecture treats persistence as the &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;default foundation, not an opt-in feature&lt;/a&gt;. Anthropic's Managed Agents Memory (currently in public beta) builds persistent cross-session memory directly into the hosted runtime. Research from multiple institutions explicitly frames this as the &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;"AI Workflow Store" concept&lt;/a&gt; — arguing that on-the-fly agents without workflow persistence are architecturally unsound for production use.&lt;/p&gt;

&lt;p&gt;Key properties being standardized: deterministic replay from any checkpoint, branch-aware versioning for what-if exploration, cost and latency attribution per workflow step, and provenance chains linking outputs to specific tool invocations. These aren't nice-to-haves. They're the primitives that make agent systems auditable, debuggable, and reproducible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Flying Under the Radar
&lt;/h2&gt;

&lt;p&gt;Most teams still treat agent runs as ephemeral. You prompt, the agent acts, you get output — the execution trace is debugging information, discarded once the task completes. This mental model was inherited from the era of one-shot LLM calls, and it persists even as agents become multi-step, multi-tool, multi-session systems.&lt;/p&gt;

&lt;p&gt;The tooling fragmentation obscures the pattern. LangGraph calls it "persistence layer." Anthropic calls it "managed memory." The research literature calls it &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;"AI Workflow Store"&lt;/a&gt;. Framework comparison guides list "checkpoint-resume recovery" and &lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;"state management between runs"&lt;/a&gt; as selection criteria — these weren't even categories twelve months ago. Same primitive, different names, no unified vocabulary for engineers to recognize the convergence.&lt;/p&gt;

&lt;p&gt;Meanwhile, current pain is attributed to wrong causes. Teams blame model inconsistency for irreproducible agent behavior, then spend weeks on prompt engineering when the actual gap is lack of workflow versioning and deterministic replay. The &lt;a href="https://github.com/vectara/awesome-agent-failures" rel="noopener noreferrer"&gt;documented failure patterns&lt;/a&gt; repeatedly show incidents — database wipes, cascading outages, unrecoverable state corruption — where workflow checkpointing would have turned catastrophic failures into recoverable interruptions.&lt;/p&gt;

&lt;p&gt;The "on-the-fly agent" paradigm — synthesize and execute per-prompt — is still the dominant mental model. &lt;a href="https://arxiv.org/html/2605.29442v1" rel="noopener noreferrer"&gt;Recent research on coding agent failures&lt;/a&gt; shows that context poisoning and prompt variations cause unpredictable divergence in agent behavior. Engineers optimize prompts when they should be versioning workflows. The orchestration layer is becoming the durable artifact, not the model outputs — but you can't see this if you're focused on model selection and prompt tuning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Try It Today
&lt;/h2&gt;

&lt;p&gt;Let's make this concrete. The following example demonstrates a minimal workflow persistence layer using LangGraph's checkpoint architecture. This isn't production code — it's structured to show you the primitives so you can recognize them in your own stack.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow_persistence_demo.py
# Requires: pip install langgraph&amp;gt;=2.0.0 langchain-core&amp;gt;=0.2.0
# Demonstrates: checkpoint-resume, workflow serialization, replay-from-state
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="c1"&gt;# Define the state schema — this is what gets persisted at each checkpoint
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Conversation history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recorded tool invocations with metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;step_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;branch_point&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# For what-if exploration
&lt;/span&gt;
&lt;span class="c1"&gt;# Simulated tools — in production, these would be your actual integrations
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates a search API call with cost/latency tracking.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;code_interpreter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulates code execution with full provenance.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_interpreter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execution result: success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;340&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Workflow nodes — each node modifies state and creates a checkpoint
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;First step: analyze the incoming request and decide on tools.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyzing request, will need search and code execution.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute search tool and record the invocation.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow persistence patterns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provenance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;  &lt;span class="c1"&gt;# Full provenance chain attached
&lt;/span&gt;    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute code and record with input hash for reproducibility.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;code_interpreter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;print(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;analyzing search results&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;branch_point&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post-code-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Mark branch point
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Final synthesis step — this is where audit trails matter most.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Final output synthesized from tool results.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;step_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_provenance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
                           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Build the graph with persistence enabled
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_persistent_workflow&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Constructs workflow graph with checkpoint-resume architecture.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add nodes
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analyze_request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;execute_search&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;execute_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_output&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Define edges — this is the workflow "spec" that gets versioned
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Enable persistence — this is the key primitive
&lt;/span&gt;    &lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;checkpointer&lt;/span&gt;

&lt;span class="c1"&gt;# Demonstration: run, checkpoint, serialize, replay
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_persistent_workflow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Initial state
&lt;/span&gt;    &lt;span class="n"&gt;initial_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkflowState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze workflow patterns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
        &lt;span class="n"&gt;step_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wf-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;branch_point&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Run with thread_id for checkpoint tracking
&lt;/span&gt;    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;demo-thread-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;

    &lt;span class="c1"&gt;# Execute workflow — each node creates a checkpoint
&lt;/span&gt;    &lt;span class="n"&gt;final_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checkpoint: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;final_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;

    &lt;span class="c1"&gt;# Export workflow trace as portable artifact
&lt;/span&gt;    &lt;span class="n"&gt;workflow_artifact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workflow_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_cost_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; 
                                 &lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; 
                                &lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exportable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# This artifact can be stored, versioned, replayed
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- Workflow Artifact (portable, versionable) ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;workflow_artifact&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight isn't the code itself — it's what the code eliminates. Every &lt;code&gt;tool_calls&lt;/code&gt; entry carries provenance. Every step creates a checkpoint. The workflow artifact at the end isn't a log; it's a deployable object that can be stored in a workflow store, versioned like code, and replayed against different models to verify consistency. The &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;&lt;code&gt;branch_point&lt;/code&gt; field&lt;/a&gt; enables what-if exploration: clone this workflow, modify the decision at step 3, replay against identical inputs.&lt;/p&gt;

&lt;p&gt;For teams using Claude Code, examine the &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;five-stage progressive compaction system&lt;/a&gt; — budget reduction, snip, microcompact, context collapse, auto-compact. This is workflow state management in disguise, determining which historical context survives as the agent continues execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;The architectural implications are substantial, and they cut across concerns that currently live in different parts of your codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit and compliance become tractable.&lt;/strong&gt; Every agent decision has a provenance chain. For teams in regulated industries — finance, healthcare, legal — this is transformational. &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;Demonstrating exactly how an output was produced&lt;/a&gt;, which tools were consulted, what data influenced each step: these go from "reconstructed after the fact from scattered logs" to "queryable from the workflow artifact." The compliance team's question "why did the system recommend X?" becomes a database lookup, not a forensic investigation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent reliability shifts from model tuning to workflow engineering.&lt;/strong&gt; Instead of hoping the model behaves consistently across prompts, you define and version the workflow, then swap models underneath. The workflow is the contract. &lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Recent analysis of agentic systems&lt;/a&gt; emphasizes that this decoupling — stable workflow interface, replaceable model implementation — is what enables genuine production reliability. You're no longer debugging "why did GPT-4 do something different this time?" You're debugging "which version of the workflow was deployed?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost attribution becomes granular.&lt;/strong&gt; Each workflow step carries its own token, time, and cost metadata. Teams can &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;optimize specific bottlenecks&lt;/a&gt; rather than treating agent runs as opaque cost centers. "The agent costs $0.47 per run" becomes "the search-result-filtering step costs $0.23, the synthesis step costs $0.08, the tool-selection step costs $0.16." That granularity enables targeted optimization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The debugging experience transforms.&lt;/strong&gt; "Why did the agent do X?" becomes a query against a workflow trace, not a reconstruction from scattered logs. Deterministic replay lets you step through agent reasoning like a debugger — not just logging what happened, but re-executing the exact sequence to reproduce the behavior. The &lt;a href="https://github.com/vectara/awesome-agent-failures" rel="noopener noreferrer"&gt;failure pattern documentation&lt;/a&gt; consistently shows that teams with checkpoint-resume can recover from errors that would be catastrophic for teams without it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Signal
&lt;/h2&gt;

&lt;p&gt;Watch what the frameworks are building into their foundations, not what they're marketing. The signal here is unambiguous.&lt;/p&gt;

&lt;p&gt;LangGraph 2.0 codifies &lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;"unified agent primitives (Router, Supervisor, Subagent)"&lt;/a&gt; with persistence as the default. This isn't an opt-in feature — it's the architectural foundation. The framework assumes you want checkpoints; you have to actively disable them. That default tells you what the LangChain team expects production systems to need.&lt;/p&gt;

&lt;p&gt;Anthropic is building persistent cross-session memory directly into the hosted agent runtime. The &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;Claude Managed Agents Memory&lt;/a&gt; public beta treats the workflow trace as a platform service. You don't implement persistence; the platform provides it. That's the kind of infrastructure investment companies make when they expect a primitive to become mandatory.&lt;/p&gt;

&lt;p&gt;The research convergence is explicit. &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;"Engineering Robustness into Personal Agents with the AI Workflow Store"&lt;/a&gt; argues directly that on-the-fly agents without workflow persistence are architecturally unsound for production. The paper isn't hedging — it's stating a position based on observed failure patterns.&lt;/p&gt;

&lt;p&gt;The failure evidence supports the claim. &lt;a href="https://github.com/vectara/awesome-agent-failures" rel="noopener noreferrer"&gt;Documentation of agent failures&lt;/a&gt; repeatedly shows incidents where lack of workflow checkpointing turned recoverable errors into catastrophic ones. Database wipes. Cascading outages. State corruption that couldn't be unwound. These aren't theoretical concerns; they're documented production incidents.&lt;/p&gt;

&lt;p&gt;Framework comparison guides now list &lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;"checkpoint-resume recovery" and "state management between runs"&lt;/a&gt; as selection criteria. Twelve months ago, these categories didn't exist in framework comparisons. The fact that they're now standard evaluation criteria tells you where the industry expects the baseline to move.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shift Rating
&lt;/h2&gt;

&lt;p&gt;🟢 &lt;strong&gt;Adopt Now&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Teams without workflow persistence are accumulating invisible technical debt. Every "it worked yesterday, why doesn't it work today?" debugging session. Every compliance question that requires manual trace reconstruction. Every agent failure that cascades because there's no checkpoint to recover from. Every cost optimization that's impossible because you can't attribute expense to specific steps.&lt;/p&gt;

&lt;p&gt;The primitives exist in production-ready frameworks today. LangGraph 2.0 is stable. The architectural patterns are &lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;documented&lt;/a&gt; and &lt;a href="https://github.com/vectara/awesome-agent-failures" rel="noopener noreferrer"&gt;validated against failure cases&lt;/a&gt;. The question isn't whether this becomes the standard — the question is how much technical debt you accumulate before adopting it.&lt;/p&gt;

&lt;p&gt;The floor has already moved. The question is whether your agents are standing on it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2605.10907v1" rel="noopener noreferrer"&gt;Engineering Robustness into Personal Agents with the AI Workflow Store&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/state-of-agent-engineering" rel="noopener noreferrer"&gt;State of Agent Engineering - LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report - Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;AI Agent Frameworks Comparison 2026: Complete Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vectara/awesome-agent-failures" rel="noopener noreferrer"&gt;GitHub - vectara/awesome-agent-failures: A community curated collection of AI agent failure modes and battle-tested solutions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2605.29442v1" rel="noopener noreferrer"&gt;How Coding Agents Fail Their Users: A Large-Scale Analysis - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Rethinking Software Engineering for Agentic AI Systems - arXiv&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of **Primitive Shifts&lt;/em&gt;* — a monthly series tracking when new AI building blocks&lt;br&gt;
move from novel experiments to infrastructure you'll be expected to know.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Next MCP Watch series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Spotted a shift happening in your stack? Drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Agent Skills: The Emerging Architecture for Composable, Evolvable Agent Capabilities</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 25 May 2026 12:06:27 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-agent-skills-the-emerging-architecture-for-composable-evolvable-agent-capabilities-23ao</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-agent-skills-the-emerging-architecture-for-composable-evolvable-agent-capabilities-23ao</guid>
      <description>&lt;h1&gt;
  
  
  AI Agent Skills: The Emerging Architecture for Composable, Evolvable Agent Capabilities
&lt;/h1&gt;

&lt;p&gt;The tools abstraction that powered the first wave of production agents is hitting its ceiling. When your agent needs to "review code," it doesn't just call a function—it reads previous review comments from memory, applies learned heuristics about the codebase, adapts its critique style to the author, and improves its approach based on whether past suggestions were accepted. This isn't a stateless function call. It's a &lt;em&gt;skill&lt;/em&gt;. And in the past four months, the entire agent framework ecosystem has converged on this distinction with remarkable speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: From Tools to Skills — A Paradigm Shift in Agent Design
&lt;/h2&gt;

&lt;p&gt;The research community's pivot to skills has been dramatic. Since February 2026, we've seen over 20 papers explicitly addressing skill architectures, skill learning, and skill evaluation—a signal that the field has identified a fundamental gap in how we build agents. The &lt;a href="https://arxiv.org/html/2601.12560v1" rel="noopener noreferrer"&gt;agentic AI architectures survey&lt;/a&gt; published in January laid the theoretical groundwork, distinguishing between "reactive tool use" and "proactive capability development." By March, the major frameworks had taken notice.&lt;/p&gt;

&lt;p&gt;The distinction between tools and skills is more than semantic. Tools are stateless function calls: &lt;code&gt;search_web(query) → results&lt;/code&gt;. Skills are learned, versioned, composable capabilities with memory and context: a "web research" skill knows which sources proved reliable in past investigations, adapts search strategies based on domain, and can delegate to sub-skills for fact verification. &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;LangChain's March 2026 newsletter&lt;/a&gt; announced their Deep Agents Skills system, explicitly framing it as "the next layer of abstraction above tools." CrewAI followed with self-healing skills in their &lt;a href="https://venturebeat.com/ai/crewai-launches-its-first-multi-agent-builder-speeding-the-way-to-agentic-ai" rel="noopener noreferrer"&gt;enterprise multi-agent builder&lt;/a&gt;. Microsoft Foundry introduced &lt;a href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/advanced-function-calling-and-multi-agent-systems-with-small-language-models-in-/4481180" rel="noopener noreferrer"&gt;skill primitives for multi-agent coordination&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Why the sudden convergence? The emergence of reasoning models—o1, R1, Gemini 2.5—finally gave agents the cognitive horsepower for genuine skill acquisition and composition. Earlier models could &lt;em&gt;use&lt;/em&gt; tools when instructed; reasoning models can &lt;em&gt;learn&lt;/em&gt; when and how to combine capabilities, recognize when a skill is failing, and propose refinements. &lt;a href="https://github.com/weitianxin/Awesome-Agentic-Reasoning" rel="noopener noreferrer"&gt;Research on agentic reasoning&lt;/a&gt; shows these models achieving 40-60% better performance on multi-step tasks when given skill-level abstractions rather than raw tool access.&lt;/p&gt;

&lt;p&gt;My thesis: Skills represent the "package manager" moment for agentic AI. Just as npm made JavaScript code genuinely reusable and composable, skill architectures make agent capabilities genuinely shareable and evolvable. We're moving from "agents that can do things" to "agents that can learn to do things better."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill Architecture Stack: Anatomy of a Modern Agent Skill
&lt;/h2&gt;

&lt;p&gt;Understanding the skill architecture requires thinking in three layers: definition, runtime, and lifecycle. Each layer addresses a distinct concern that tools-based approaches left unresolved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill Definition&lt;/strong&gt; encompasses the schema and metadata that describe what a skill does, what it requires, and what it produces. Unlike tool schemas (which specify only function signatures), skill schemas include capability declarations, memory access patterns, and composition rules. &lt;a href="https://www.datatrainersllc.com/blog/langchain-vs-langgraph-agents" rel="noopener noreferrer"&gt;LangChain's approach&lt;/a&gt; defines skills as first-class graph nodes with typed state, while CrewAI binds skills to agent roles with explicit permission scopes. The emerging SkillNet interchange format (referenced in multiple &lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;2026 framework comparisons&lt;/a&gt;) aims to make skills portable across frameworks, though adoption remains early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill Runtime&lt;/strong&gt; handles execution context and memory access. This is where the tools/skills distinction matters most. A skill runtime provides: (1) access to episodic memory for retrieving relevant past experiences, (2) working memory for multi-step reasoning within the skill, and (3) tool delegation for invoking lower-level capabilities. &lt;a href="https://github.com/microsoft/autogen/discussions/7144" rel="noopener noreferrer"&gt;AutoGen's shared state discussions&lt;/a&gt; reveal the complexity here—agents need fine-grained control over which memories a skill can read versus modify.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill Lifecycle&lt;/strong&gt; manages versioning, evaluation, and deprecation. &lt;a href="https://arxiv.org/html/2601.07136v1" rel="noopener noreferrer"&gt;Research on multi-agent system development&lt;/a&gt; found that 34% of production agent failures traced to skill version mismatches or unevaluated skill changes. Modern skill architectures treat skills like software packages: semantic versioning, dependency declarations, and explicit deprecation policies.&lt;/p&gt;

&lt;p&gt;The capability declaration pattern deserves special attention. Drawing from &lt;a href="https://arxiv.org/html/2603.20953v1" rel="noopener noreferrer"&gt;deterministic pre-action authorization research&lt;/a&gt;, skills now declare required permissions upfront. A "code review" skill might declare: &lt;code&gt;requires: [read:repository, read:pull_request, write:comments]&lt;/code&gt;. The runtime enforces these boundaries, preventing skill drift into unauthorized behaviors. This least-privilege approach—termed SkillScope in the authorization literature—is essential for enterprise deployments where audit requirements are strict.&lt;/p&gt;

&lt;p&gt;Composition primitives enable skills to work together. Skill chaining sequences capabilities (research → summarize → cite). Skill delegation allows one skill to invoke another (a "write report" skill delegating to "generate chart" skill). Skill fallback hierarchies provide graceful degradation (try "semantic search" skill, fall back to "keyword search" skill). These patterns are now supported natively in &lt;a href="https://www.datatrainersllc.com/blog/langchain-vs-langgraph-agents" rel="noopener noreferrer"&gt;LangGraph's agent framework&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skill Acquisition: How Agents Learn New Capabilities
&lt;/h2&gt;

&lt;p&gt;The most profound shift isn't just &lt;em&gt;having&lt;/em&gt; skills—it's how agents &lt;em&gt;acquire&lt;/em&gt; them. Three distinct pathways have emerged in 2026 research, each with different tradeoffs for production systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human-authored skills&lt;/strong&gt; remain the foundation. A developer writes skill code, defines the schema, and registers it with the agent. This approach offers maximum control and reliability but scales poorly. &lt;a href="https://pub.towardsai.net/a-developers-guide-to-agentic-frameworks-in-2026-3f22a492dc3d" rel="noopener noreferrer"&gt;Framework comparison analyses&lt;/a&gt; note that human-authored skills typically require 2-4 hours of engineering time per skill, including testing and documentation. For core business logic, this investment makes sense. For long-tail capabilities, it's prohibitive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demonstration-learned skills&lt;/strong&gt; represent the middle ground. The agent observes a human performing a task—watching tool invocations, reading decisions made, noting outcomes—and extracts a reusable skill representation. Research on &lt;a href="https://github.com/masamasa59/ai-agent-papers/blob/main/capability-papers/tool-use.md" rel="noopener noreferrer"&gt;tool use capabilities&lt;/a&gt; shows demonstration learning achieving 70-80% of human-authored skill quality with 10x less human effort. The key insight: demonstrations should capture not just &lt;em&gt;what&lt;/em&gt; was done, but &lt;em&gt;why&lt;/em&gt;—the decision points, the alternatives considered, the success criteria applied.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-evolved skills&lt;/strong&gt; push further into autonomy. The agent generates skill candidates, tests them against task outcomes, and refines through reinforcement learning. &lt;a href="https://arxiv.org/html/2604.27859v3" rel="noopener noreferrer"&gt;Research on agentic reinforcement learning&lt;/a&gt; introduced GRPO (Group Relative Policy Optimization) for skill training, providing step-wise rewards for skill invocation decisions rather than just final task success. This enables agents to learn nuanced skill selection: when to use "precise search" vs. "exploratory search," when to delegate vs. handle directly.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;challenges research&lt;/a&gt; on agentic AI's path forward emphasizes that self-evolved skills require robust evaluation infrastructure. Without it, agents can develop confidently wrong skills—capabilities that appear to work in training but fail catastrophically in production. The paper recommends a "skill quarantine" pattern: newly evolved skills run in shadow mode, their outputs logged but not acted upon, until evaluation metrics clear predetermined thresholds.&lt;/p&gt;

&lt;p&gt;A practical production pattern is emerging: start with human-authored core skills for critical paths, enable demonstration-learning for domain adaptation (letting power users teach the agent their workflows), and restrict self-evolution to well-bounded capability improvements. &lt;a href="https://arxiv.org/html/2512.17896v2" rel="noopener noreferrer"&gt;XAgen's explainability work&lt;/a&gt; provides tools for understanding &lt;em&gt;why&lt;/em&gt; a skill evolved in a particular direction, essential for maintaining trust in self-improving systems.&lt;/p&gt;

&lt;p&gt;The experience compression spectrum offers a useful mental model. Not every learning should become a skill. Some belong as episodic memories (specific instances to retrieve when relevant). Others crystallize into skills (reusable capabilities worth naming and versioning). A few should codify as rules (invariants that must always hold). The &lt;a href="https://arxiv.org/html/2602.10479v1" rel="noopener noreferrer"&gt;AI agent software architecture evolution&lt;/a&gt; paper provides decision heuristics: if you'd invoke the capability &amp;gt;100 times and it requires multi-step reasoning, it's a skill candidate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's build a skill-enabled research agent using current APIs. This example demonstrates the complete skill lifecycle: definition, registration, invocation, memory integration, and basic self-improvement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Skill-enabled research agent using LangGraph and LangChain patterns.
Demonstrates: skill definition, composition, memory coupling, and evaluation.
Requires: langgraph&amp;gt;=0.5.0, langchain-core&amp;gt;=0.3.0, langchain-anthropic&amp;gt;=0.2.0
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;

&lt;span class="c1"&gt;# --- Skill Schema Definition ---
# Skills are more than tools: they declare capabilities, memory access, and composition rules
&lt;/span&gt;
&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SkillMetadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Metadata for skill versioning and governance.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# Semver: breaking changes in skills break agents
&lt;/span&gt;    &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;required_permissions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# SkillScope-style declarations
&lt;/span&gt;    &lt;span class="n"&gt;memory_access&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;delegatable&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Can this skill invoke other skills?
&lt;/span&gt;
&lt;span class="nd"&gt;@dataclass&lt;/span&gt;  
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SkillExecutionContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Runtime context provided to skill during execution.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;episodic_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieved relevant past experiences
&lt;/span&gt;    &lt;span class="n"&gt;working_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;  &lt;span class="c1"&gt;# Scratch space for multi-step reasoning
&lt;/span&gt;    &lt;span class="n"&gt;available_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Skills this skill can delegate to
&lt;/span&gt;    &lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# For evaluation and audit
&lt;/span&gt;
&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SkillResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Standardized skill output with provenance.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;sources_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;delegated_to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Which skills were invoked
&lt;/span&gt;    &lt;span class="n"&gt;memory_writes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# What should be persisted
&lt;/span&gt;    &lt;span class="n"&gt;execution_time_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;

&lt;span class="c1"&gt;# --- Concrete Skill Implementation ---
# A WebResearchSkill that demonstrates memory coupling and tool delegation
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebResearchSkill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    A skill for conducting web research with memory and learning.
    Unlike a simple search tool, this skill:
    - Retrieves past research on similar topics from memory
    - Adapts search strategy based on what worked before
    - Persists successful research patterns for future use
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillMetadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.2.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research_team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;required_permissions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search:web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read:memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write:memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;memory_access&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;delegatable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_tool&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quick&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comprehensive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SkillExecutionContext&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;SkillResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute research with memory-informed strategy selection.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="n"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 1: Retrieve relevant past research from episodic memory
&lt;/span&gt;        &lt;span class="c1"&gt;# This is what distinguishes skills from tools
&lt;/span&gt;        &lt;span class="n"&gt;past_research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodic_memory&lt;/span&gt; 
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_topic_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 2: Adapt strategy based on past outcomes
&lt;/span&gt;        &lt;span class="c1"&gt;# If previous research on similar topics found certain sources reliable,
&lt;/span&gt;        &lt;span class="c1"&gt;# prioritize those sources
&lt;/span&gt;        &lt;span class="n"&gt;reliable_sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_extract_reliable_sources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;past_research&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;search_strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_select_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reliable_sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 3: Execute search with adapted strategy
&lt;/span&gt;        &lt;span class="n"&gt;search_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_generate_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_strategy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sq&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;search_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reliable_sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 4: Synthesize results using LLM
&lt;/span&gt;        &lt;span class="n"&gt;synthesis_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;past_research&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Step 5: Prepare memory writes for future skill invocations
&lt;/span&gt;        &lt;span class="c1"&gt;# This is skill learning: recording what worked for future use
&lt;/span&gt;        &lt;span class="n"&gt;memory_writes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strategy_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;search_strategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources_found_useful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_identify_useful_sources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outcome_pending&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Will be updated based on user feedback
&lt;/span&gt;        &lt;span class="p"&gt;}]&lt;/span&gt;

        &lt;span class="n"&gt;execution_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;SkillResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_compute_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;sources_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;delegated_to&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="n"&gt;memory_writes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_writes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;execution_time_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;execution_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokens&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_topic_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compute semantic similarity between queries. Simplified for example.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# In production: use embedding similarity
&lt;/span&gt;        &lt;span class="n"&gt;common_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="n"&gt;all_words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;common_words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_words&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;all_words&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_extract_reliable_sources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;past_research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Identify sources that proved useful in past research.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;source_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;past_research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources_found_useful&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                &lt;span class="n"&gt;source_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;source_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_select_strategy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reliable_sources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Select search strategy based on depth and past learning.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;base_strategies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quick&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_queries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_results_per_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;standard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_queries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_results_per_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comprehensive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_queries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_results_per_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_strategies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prioritized_sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reliable_sources&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;strategy&lt;/span&gt;

    &lt;span class="c1"&gt;# Additional helper methods omitted for brevity...
&lt;/span&gt;
&lt;span class="c1"&gt;# --- Skill Registration and Agent Assembly ---
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SkillRegistry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Registry for managing skill versions and dependencies.
    Implements the Fleet SkillAttachment pattern for version constraints.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# name -&amp;gt; version -&amp;gt; skill
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_active_versions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# name -&amp;gt; active version
&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config_overrides&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Register a skill with optional configuration.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config_overrides&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;registered_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;# Set as active if no version active or this is newer
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_active_versions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_active_versions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve skill by name, optionally pinning version.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;target_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_active_versions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;target_version&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skill &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;target_version&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# --- Skill-Based Agent State ---
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchAgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;State for the research agent with skill-aware fields.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;
    &lt;span class="n"&gt;current_task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;skill_invocations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Track which skills were used
&lt;/span&gt;    &lt;span class="n"&gt;episodic_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieved memories for context
&lt;/span&gt;    &lt;span class="n"&gt;pending_memory_writes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Memories to persist after completion
&lt;/span&gt;
&lt;span class="c1"&gt;# --- Agent Construction ---
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill_registry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SkillRegistry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Build a LangGraph agent that selects and invokes skills.
    The agent decides WHICH skill to use; skills handle HOW to execute.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;skill_selector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchAgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ResearchAgentState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Agent decides which skill to invoke based on task and context.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;available_skills&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summarization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# LLM decides which skill(s) to invoke
&lt;/span&gt;        &lt;span class="n"&gt;selection_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Given this task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Available skills: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;available_skills&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Recent skill invocations: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;skill_invocations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Which skill should be invoked? Respond with JSON: {{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: {{...}}}}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;selection_prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;selection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Get and execute the selected skill
&lt;/span&gt;        &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skill_registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;selection&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillExecutionContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;episodic_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodic_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;working_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{},&lt;/span&gt;
            &lt;span class="n"&gt;available_skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;available_skills&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;trace_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;trace_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;selection&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Update state with skill results
&lt;/span&gt;        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill_invocations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;selection&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;selection&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result_summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_memory_writes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_writes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Build the graph
&lt;/span&gt;    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ResearchAgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select_and_invoke_skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;skill_selector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select_and_invoke_skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;select_and_invoke_skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code demonstrates several key patterns from the skill architecture: typed skill metadata with version and permission declarations, memory coupling where skills read from and write to episodic memory, and the separation between skill selection (agent's job) and skill execution (skill's job). The &lt;code&gt;SkillResult&lt;/code&gt; type ensures every skill invocation produces traceable, auditable output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation and Governance: Making Skills Production-Ready
&lt;/h2&gt;

&lt;p&gt;Skills without evaluation are liabilities. The 2026 research landscape has produced several benchmarking frameworks that address different aspects of skill quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;Framework evaluations&lt;/a&gt; have converged on five evaluation axes for production skills. &lt;strong&gt;Correctness&lt;/strong&gt; measures whether skill outputs meet acceptance criteria. &lt;strong&gt;Efficiency&lt;/strong&gt; tracks token and compute costs relative to output quality. &lt;strong&gt;Generalization&lt;/strong&gt; tests whether skills transfer to novel inputs within their intended domain. &lt;strong&gt;Composability&lt;/strong&gt; verifies that skills work correctly when chained with others. &lt;strong&gt;Safety&lt;/strong&gt; ensures skills operate within declared permission boundaries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/html/2604.16646v1" rel="noopener noreferrer"&gt;Research on agentic frameworks&lt;/a&gt; introduced SkillGenBench, specifically measuring whether agents can create useful new skills. This matters for systems with self-evolution enabled: if your agent proposes skill refinements, you need automated evaluation of those proposals before promotion to production. SkillGenBench tests include held-out task sets, adversarial inputs designed to expose skill boundaries, and composition stress tests.&lt;/p&gt;

&lt;p&gt;For agents with continual learning, skill regression becomes a concern. &lt;a href="https://arxiv.org/html/2601.07136v1" rel="noopener noreferrer"&gt;Multi-agent system studies&lt;/a&gt; found that 23% of skill updates introduced regressions in other skills—a new research skill that searches more thoroughly might break a summarization skill's token budget assumptions. SkillLearnBench provides regression testing protocols: after any skill change, re-evaluate not just that skill but all skills that compose with it.&lt;/p&gt;

&lt;p&gt;Governance primitives are equally important for enterprise deployments. &lt;a href="https://arxiv.org/html/2512.17896v2" rel="noopener noreferrer"&gt;Research on explainability&lt;/a&gt; introduced Counterfactual Trace Auditing: given a skill execution trace, determine what would have happened with different inputs or different skill versions. This supports both debugging ("why did the research skill produce wrong results?") and compliance ("can we prove the skill never accessed unauthorized data?").&lt;/p&gt;

&lt;p&gt;The least-privilege enforcement pattern from &lt;a href="https://arxiv.org/html/2603.20953v1" rel="noopener noreferrer"&gt;authorization research&lt;/a&gt; deserves implementation from day one. Skills declare permissions; runtime enforces them. A skill claiming &lt;code&gt;read:memory&lt;/code&gt; cannot write to memory, even if it contains code attempting to do so. The enforcement layer intercepts all memory and tool access, checking against declared permissions. This prevents both accidental scope creep and adversarial prompt injection attacks that try to escalate skill privileges.&lt;/p&gt;

&lt;p&gt;Cost attribution often gets overlooked until bills arrive. Skills should report token usage, and the orchestration layer should aggregate costs per skill per task type. &lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;Enterprise platform discussions&lt;/a&gt; emphasize that skill-level cost visibility enables optimization: if your research skill costs 10x your analysis skill but delivers only 2x the value, that's actionable intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you're starting a new agent project&lt;/strong&gt;, choose a framework with first-class skill support. &lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;LangChain's Deep Agents&lt;/a&gt;, &lt;a href="https://venturebeat.com/ai/crewai-launches-its-first-multi-agent-builder-speeding-the-way-to-agentic-ai" rel="noopener noreferrer"&gt;CrewAI Enterprise&lt;/a&gt;, and &lt;a href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/advanced-function-calling-and-multi-agent-systems-with-small-language-models-in-/4481180" rel="noopener noreferrer"&gt;Microsoft Foundry&lt;/a&gt; all offer skill primitives. Retrofitting skill abstractions onto tool-based agents requires rearchitecting memory access patterns and state management—it's substantially harder than building with skills from the start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you have existing tool-based agents&lt;/strong&gt;, begin migrating high-value tool chains to skill abstractions incrementally. Start with tools that have implicit memory dependencies: anything that benefits from "remembering" past invocations. A search tool becomes a research skill when it tracks which sources proved reliable. A code generation tool becomes a coding skill when it learns from past review feedback. The &lt;a href="https://arxiv.org/html/2602.10479v1" rel="noopener noreferrer"&gt;evolution of agent architectures&lt;/a&gt; provides migration patterns for this transition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill versioning strategy&lt;/strong&gt; requires treating skills like npm packages. Use semantic versioning: patch versions for bug fixes, minor versions for backward-compatible capability additions, major versions for breaking changes. Maintain lockfiles that pin skill versions per deployment. Establish deprecation policies—how long do you support old skill versions? &lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;Production rankings&lt;/a&gt; show that teams with explicit versioning policies experience 60% fewer production incidents from skill changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluation investment&lt;/strong&gt; scales with skill complexity. Skill-based agents require skill-level evaluation, not just end-to-end task success. If your research agent fails, you need to know whether the research skill failed, the synthesis skill failed, or the composition logic failed. Budget for evaluation infrastructure—expect 15-20% of agent development effort to go toward testing and benchmarking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security implications&lt;/strong&gt; are substantial. Skills with memory access and tool delegation are powerful attack surfaces. A compromised skill can exfiltrate data through memory writes, escalate privileges through delegation, or persist malicious patterns for future invocations. Implement permission enforcement from day one; adding it later requires auditing every existing skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team structure&lt;/strong&gt; may need adjustment. Skills create natural ownership boundaries. Consider a skill ownership model similar to microservice ownership: designated maintainers, explicit SLOs, documented interfaces. &lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Developer tool discussions&lt;/a&gt; suggest that teams with clear skill ownership see faster iteration and fewer cross-cutting bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeline expectations&lt;/strong&gt;: Skill architectures are production-ready now, but expect significant API churn through 2026. Abstract your skill interfaces—depend on your own skill protocols, not framework-specific implementations directly. The SkillNet interchange format may stabilize by Q4 2026, at which point portability across frameworks becomes practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;Build a &lt;strong&gt;skill-enabled personal research assistant&lt;/strong&gt; that demonstrates the tools-to-skills evolution:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with a basic research agent using standard tools (web search, document reading)&lt;/li&gt;
&lt;li&gt;Add a &lt;code&gt;SkillRegistry&lt;/code&gt; and migrate web search to a &lt;code&gt;WebResearchSkill&lt;/code&gt; with memory coupling&lt;/li&gt;
&lt;li&gt;Implement episodic memory that tracks: which sources proved useful, which search strategies worked for different query types, which results the user marked as helpful&lt;/li&gt;
&lt;li&gt;Add skill-level evaluation: track correctness (did the user accept the research?), efficiency (tokens per useful result), and generalization (does the skill work on new topic domains?)&lt;/li&gt;
&lt;li&gt;Implement one round of demonstration learning: record yourself researching a topic, have the agent extract a skill refinement, evaluate whether the refinement improves outcomes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The complete implementation should take 8-12 hours. By the end, you'll have hands-on experience with skill schemas, memory coupling patterns, and the evaluation infrastructure that makes skills production-ready. More importantly, you'll understand &lt;em&gt;why&lt;/em&gt; the industry is converging on this abstraction—and be ready to apply it to your production systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/march-2026-langchain-newsletter" rel="noopener noreferrer"&gt;March 2026: LangChain Newsletter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/nvidia-enterprise" rel="noopener noreferrer"&gt;LangChain Announces Enterprise Agentic AI Platform Built with NVIDIA&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pub.towardsai.net/a-developers-guide-to-agentic-frameworks-in-2026-3f22a492dc3d" rel="noopener noreferrer"&gt;A Developer's Guide to Agentic Frameworks in 2026 - Towards AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.datatrainersllc.com/blog/langchain-vs-langgraph-agents" rel="noopener noreferrer"&gt;LangChain vs LangGraph 2026: Which AI Agent Framework?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alicelabs.ai/en/insights/best-ai-agent-frameworks-2026" rel="noopener noreferrer"&gt;AI Agent Frameworks 2026: Production-Tested Ranking by Alice Labs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/manduks/bb0a93c1e0eb21bc718a78ffdcefdc95" rel="noopener noreferrer"&gt;AI Agent Frameworks Comparison 2026: Complete Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/microsoft/autogen/discussions/7144" rel="noopener noreferrer"&gt;Handling shared state across multi-agent conversations in AutoGen&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/ai/crewai-launches-its-first-multi-agent-builder-speeding-the-way-to-agentic-ai" rel="noopener noreferrer"&gt;CrewAI now lets you build fleets of enterprise AI agents | VentureBeat&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2512.17896v2" rel="noopener noreferrer"&gt;XAgen: An Explainability Tool for Identifying and Correcting Failures in Multi-Agent Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.07136v1" rel="noopener noreferrer"&gt;A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.20953v1" rel="noopener noreferrer"&gt;Deterministic Pre-Action Authorization for Autonomous AI Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcommunity.microsoft.com/blog/educatordeveloperblog/advanced-function-calling-and-multi-agent-systems-with-small-language-models-in-/4481180" rel="noopener noreferrer"&gt;Advanced Function Calling and Multi-Agent Systems with Small Language Models in Foundry Local&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/masamasa59/ai-agent-papers/blob/main/capability-papers/tool-use.md" rel="noopener noreferrer"&gt;ai-agent-papers/capability-papers/tool-use.md at main - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16646v1" rel="noopener noreferrer"&gt;Agentic Frameworks for Reasoning Tasks: An Empirical Study&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/weitianxin/Awesome-Agentic-Reasoning" rel="noopener noreferrer"&gt;Awesome Agentic Reasoning Papers - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.02749v1" rel="noopener noreferrer"&gt;The Path Ahead for Agentic AI: Challenges and Opportunities&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.27859v3" rel="noopener noreferrer"&gt;Rethinking Agentic Reinforcement Learning In Large Language Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2601.12560v1" rel="noopener noreferrer"&gt;Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Applications&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.10479v1" rel="noopener noreferrer"&gt;The Evolution of Agentic AI Software Architecture - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Best AI Tools for Developers in 2026: What Are Your Must-Have...&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Weekly Roundup: Google Reimagines Search, OpenAI Ships Steerable Coding Agents, and Multi-Agent Systems Hit Production</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 25 May 2026 12:05:31 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-weekly-roundup-google-reimagines-search-openai-ships-steerable-coding-agents-and-multi-agent-319f</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-weekly-roundup-google-reimagines-search-openai-ships-steerable-coding-agents-and-multi-agent-319f</guid>
      <description>&lt;h1&gt;
  
  
  AI Weekly Roundup: Google Reimagines Search, OpenAI Ships Steerable Coding Agents, and Multi-Agent Systems Hit Production
&lt;/h1&gt;

&lt;p&gt;The week of May 25, 2026 marks an inflection point in how we interact with AI systems. Google's I/O announcements signal the death of the search box as we've known it for a quarter century, while OpenAI's GPT-5.3-Codex represents the maturation of coding assistants into genuine collaborative agents. Meanwhile, the enterprise world is getting real about what agentic AI means for workforces—and the answers aren't always comfortable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Rewrites the Search Playbook with AI Agents at I/O 2026
&lt;/h2&gt;

&lt;p&gt;Google unveiled its most significant Search transformation in over 25 years at I/O 2026, introducing &lt;a href="https://techcrunch.com/2026/05/19/how-to-use-googles-new-ai-agents-to-go-beyond-your-standard-searches" rel="noopener noreferrer"&gt;"information agents"&lt;/a&gt; that fundamentally change the relationship between users and information retrieval. These agents operate continuously in the background, monitoring topics of interest around the clock without requiring repeated manual searches—a shift from reactive querying to proactive intelligence gathering.&lt;/p&gt;

&lt;p&gt;The centerpiece is a redesigned "intelligent search box" that supports &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;longer conversational queries&lt;/a&gt; with an AI-powered suggestion system. Rather than optimizing for keywords, users can now express complex information needs in natural language, with the system understanding context and intent across multi-turn interactions.&lt;/p&gt;

&lt;p&gt;This represents Google's clearest articulation yet of the &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;agentic AI paradigm&lt;/a&gt;: systems that take initiative rather than passively waiting for prompts. The implications extend beyond convenience—information agents could reshape how professionals conduct research, how consumers make purchasing decisions, and how news consumption patterns evolve. Google is betting that users want AI systems working on their behalf even when they're not actively engaged, a significant assumption about user trust and privacy expectations that will face real-world testing in the months ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;Multi-agent architectures have definitively moved from research curiosity to &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;production standard&lt;/a&gt;. The dominant pattern emerging involves orchestrator agents coordinating specialized sub-agents working in parallel, each operating within dedicated context windows optimized for their specific tasks. This hierarchical approach addresses the context length limitations and specialization tradeoffs that hampered earlier monolithic agent designs.&lt;/p&gt;

&lt;p&gt;Real-world results are validating the approach. Fountain achieved &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;50% faster screening&lt;/a&gt; and reduced fulfillment center staffing timelines from weeks to under 72 hours using hierarchical multi-agent orchestration. Perhaps more striking, Zapier deployed &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;over 800 AI agents internally&lt;/a&gt; with 89% AI adoption across the entire organization—demonstrating that agent proliferation can scale within a single enterprise.&lt;/p&gt;

&lt;p&gt;The framework landscape continues maturing with clearer differentiation: &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; dominates graph-based orchestration, &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; leads for role-based crew configurations, the &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt; has succeeded Swarm for OpenAI-native development, and &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;Microsoft Agent Framework&lt;/a&gt; merges Semantic Kernel and AutoGen capabilities.&lt;/p&gt;

&lt;p&gt;The AAAI 2026 Bridge Program on &lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;Advancing LLM-Based Multi-Agent Systems&lt;/a&gt; highlights critical infrastructure gaps: BDI (belief-desire-intention) architectures, standardized communication protocols, and mechanism design principles are essential to make agentic systems transparent and accountable as they move into high-stakes domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Launches GPT-5.3-Codex: From Code Generation to Steerable Coding Agent
&lt;/h2&gt;

&lt;p&gt;OpenAI's &lt;a href="https://help.openai.com/en/articles/9624314-model-release-notes" rel="noopener noreferrer"&gt;GPT-5.3-Codex release&lt;/a&gt; represents the first model to combine the Codex and GPT-5 training stacks, unifying specialized code generation capabilities with advanced reasoning and general-purpose intelligence. The result is approximately 25% faster than predecessors while achieving new benchmark highs across coding evaluations.&lt;/p&gt;

&lt;p&gt;The more significant shift is conceptual. OpenAI is positioning GPT-5.3-Codex not as a code completion tool but as a &lt;a href="https://developers.openai.com/blog/openai-for-developers-2025" rel="noopener noreferrer"&gt;"general-purpose coding agent you can actively steer while it works"&lt;/a&gt;. This framing reflects the broader industry transition from AI as autocomplete to AI as collaborator—systems that maintain context across sessions, understand project-level architecture, and can be directed mid-task without losing thread.&lt;/p&gt;

&lt;p&gt;The practical implications align with patterns documented in the &lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends Report&lt;/a&gt;: developers increasingly want AI that can handle multi-file refactoring, maintain consistency across codebases, and explain its reasoning when asked. OpenAI is also &lt;a href="https://help.openai.com/en/articles/9624314-model-release-notes" rel="noopener noreferrer"&gt;retiring GPT-4o and legacy models&lt;/a&gt; as of February 2026, forcing migration and signaling confidence in the new architecture. The deprecation timeline gives enterprises six months to adapt their integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Jensen Huang Identifies $200 Billion "Brand New" Market for NVIDIA
&lt;/h2&gt;

&lt;p&gt;NVIDIA CEO Jensen Huang publicly announced the discovery of a substantial &lt;a href="https://www.bloomberg.com/news/videos/2026-05-21/the-next-phase-of-artificial-intelligence" rel="noopener noreferrer"&gt;new market opportunity&lt;/a&gt; valued at approximately $200 billion for the company. While Huang kept specific details characteristically vague, the announcement follows NVIDIA's established playbook of positioning itself at the center of emerging AI infrastructure buildout phases.&lt;/p&gt;

&lt;p&gt;Industry analysts speculate the opportunity relates to &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;agentic AI infrastructure&lt;/a&gt;—the compute, memory, and networking requirements to run persistent agent systems at scale differ substantially from the batch inference workloads that dominated earlier AI deployment. Continuous agent operation demands different latency profiles and memory persistence than traditional model serving.&lt;/p&gt;

&lt;p&gt;The timing coincides with surging demand for AI chips across the industry, with hyperscalers, enterprises, and sovereign AI initiatives all competing for supply. NVIDIA's GPU dominance faces increasing pressure from custom silicon (Google TPUs, Amazon Trainium, Microsoft Maia), but Huang's announcement suggests NVIDIA sees expansion opportunities beyond current competitive battlegrounds. Whether this represents a new hardware architecture, software platform play, or market adjacency remains unclear until the company's next formal disclosure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sam Altman Extends "Mic Drop" Offer to Every Y Combinator Startup
&lt;/h2&gt;

&lt;p&gt;OpenAI CEO Sam Altman made a significant &lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;blanket offer&lt;/a&gt; to all Y Combinator portfolio companies, positioning OpenAI as the default AI infrastructure provider for the startup ecosystem's most influential accelerator. The specifics involve substantial API credits and preferential pricing designed to capture developer mindshare at the earliest company stages.&lt;/p&gt;

&lt;p&gt;This represents a strategic play with long-term competitive implications. Startups that build on OpenAI APIs during their formative development create &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;switching costs&lt;/a&gt; that persist as they scale—prompt engineering, fine-tuning investments, and integration patterns all create lock-in. By subsidizing early adoption, OpenAI trades near-term revenue for future market position.&lt;/p&gt;

&lt;p&gt;The move could reshape competitive dynamics for AI API providers targeting emerging companies. Anthropic, Google, and open-source alternatives must now consider whether to match the offer or differentiate on technical merits alone. For YC companies, the offer removes one barrier to AI-native product development, though founders should consider the concentration risk of deep dependence on any single provider. The timing suggests OpenAI views the enterprise and startup channels as complementary growth vectors requiring distinct go-to-market approaches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Launches Antigravity 2.0 with Desktop App and CLI at I/O 2026
&lt;/h2&gt;

&lt;p&gt;Google's &lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Antigravity 2.0 release&lt;/a&gt; at I/O 2026 includes both a desktop application and command-line interface tool, expanding accessibility across different developer workflows. The update addresses feedback that the original web-only interface limited integration with existing development environments and automation pipelines.&lt;/p&gt;

&lt;p&gt;The CLI addition particularly matters for &lt;a href="https://github.com/ShaikhWarsi/free-ai-tools" rel="noopener noreferrer"&gt;developer tooling&lt;/a&gt; integration, enabling Antigravity capabilities within shell scripts, CI/CD pipelines, and editor extensions. This follows the pattern established by GitHub Copilot CLI and similar tools—meeting developers in their existing environments rather than requiring context switches to web interfaces.&lt;/p&gt;

&lt;p&gt;The desktop app provides offline capability and reduced latency for common operations, addressing reliability concerns for developers with inconsistent connectivity or privacy requirements for certain codebases. Combined with Google's agentic AI announcements, Antigravity 2.0 suggests a coherent strategy: intelligent agents for research and planning, practical developer tools for implementation. The framework landscape now includes &lt;a href="https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026" rel="noopener noreferrer"&gt;comprehensive options&lt;/a&gt; from every major AI provider, with Google's dual-interface approach attempting to minimize adoption friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  SoMe Benchmark: New Standard for Evaluating Social Media AI Agents
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/LivXue/SoMe" rel="noopener noreferrer"&gt;SoMe benchmark&lt;/a&gt; released at AAAI 2026 provides the first standardized framework for testing LLM-based agents in realistic social media scenarios. As social media automation becomes increasingly prevalent—for content moderation, engagement analysis, and yes, manipulation—the lack of evaluation standards has made comparing systems and identifying risks difficult.&lt;/p&gt;

&lt;p&gt;SoMe evaluates agents across &lt;a href="https://arxiv.org/list/cs.AI/new" rel="noopener noreferrer"&gt;eight key tasks&lt;/a&gt; covering diverse aspects of social media intelligence: content generation, engagement prediction, misinformation detection, sentiment analysis, trend identification, community modeling, influence measurement, and crisis response. The benchmark includes a diverse collection of test scenarios designed to stress-test agents across edge cases and adversarial conditions.&lt;/p&gt;

&lt;p&gt;The timing matters as &lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;enterprises deploy&lt;/a&gt; social media agents for customer service, reputation management, and market intelligence. Without standardized evaluation, organizations have struggled to assess vendor claims or compare in-house solutions against commercial offerings. SoMe also provides researchers with common ground for publishing reproducible results, potentially accelerating progress while also surfacing capability limitations and failure modes that matter for responsible deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Banking Sector Confronts AI Workforce Transition
&lt;/h2&gt;

&lt;p&gt;The financial services sector emerged this week as an early battleground for AI-driven organizational restructuring, with two major banks publicly addressing workforce implications. HSBC CEO told staff &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;"don't fight AI"&lt;/a&gt; as the bank implements job cuts, while StanChart CEO apologized for "upset caused" amid similar changes.&lt;/p&gt;

&lt;p&gt;These announcements mark a shift from AI experimentation to &lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;operational deployment&lt;/a&gt; with real workforce consequences. Banking offers a preview of broader enterprise patterns: highly compensated knowledge work, extensive documentation for training data, clear metrics for measuring productivity gains, and regulated environments that require careful change management.&lt;/p&gt;

&lt;p&gt;The executive messaging reveals corporate strategies for managing the transition: HSBC's directive frames resistance as futile while positioning adaptation as career protection, whereas StanChart's apology acknowledges the human cost while implying inevitability. Neither approach resolves underlying tensions about pace of change, retraining investments, or social contracts with existing employees.&lt;/p&gt;

&lt;p&gt;For the broader tech industry, banking's experience suggests that agentic AI deployment will require sophisticated organizational change management, not just technical implementation. The &lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;multi-agent systems&lt;/a&gt; replacing human workflows require human oversight structures that most organizations haven't yet designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;Google's agent rollout will face its first real user feedback in coming weeks—watch for adoption metrics and privacy backlash indicators. OpenAI's legacy model deprecation timeline creates a forcing function for enterprise migration decisions, which could surface production dependencies that aren't yet visible. And as banking workforce impacts become quantified, expect regulatory attention to intensify around AI's labor market effects, potentially shaping how quickly other sectors proceed with similar transformations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence" rel="noopener noreferrer"&gt;AI News | Latest Headlines and Developments | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.bloomberg.com/news/videos/2026-05-21/the-next-phase-of-artificial-intelligence" rel="noopener noreferrer"&gt;Watch The Next Phase of Artificial Intelligence - Bloomberg.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/05/19/how-to-use-googles-new-ai-agents-to-go-beyond-your-standard-searches" rel="noopener noreferrer"&gt;How to use Google's new AI agents to go beyond your standard searches | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/list/cs.AI/new" rel="noopener noreferrer"&gt;Artificial Intelligence&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://futureagi.com/blog/multi-agent-systems-2025" rel="noopener noreferrer"&gt;Multi-Agent AI Systems 2026: Frameworks Compared - Future AGI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf" rel="noopener noreferrer"&gt;[PDF] 2026 Agentic Coding Trends Report - Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.firecrawl.dev/blog/agentic-ai-trends" rel="noopener noreferrer"&gt;Top 11 Agentic AI Trends to Watch in 2026 - Firecrawl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;GitHub - ARUNAGIRINATHAN-K/awesome-ai-agents-2026: Awesome AI Agents for 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2511.17332v2" rel="noopener noreferrer"&gt;AAAI 2026 Bridge Program on Advancing LLM-Based Multi-Agent Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026" rel="noopener noreferrer"&gt;State of Open Source on Hugging Face: Spring 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/LivXue/SoMe" rel="noopener noreferrer"&gt;GitHub - LivXue/SoMe: (AAAI 2026) SoMe: A Realistic Benchmark for LLM-based Social Media Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://help.openai.com/en/articles/9624314-model-release-notes" rel="noopener noreferrer"&gt;Model Release Notes | OpenAI Help Center&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/orgs/community/discussions/187143" rel="noopener noreferrer"&gt;Best AI Tools for Developers in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ShaikhWarsi/free-ai-tools" rel="noopener noreferrer"&gt;GitHub - ShaikhWarsi/free-ai-tools&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://developers.openai.com/blog/openai-for-developers-2025" rel="noopener noreferrer"&gt;OpenAI for Developers in 2025&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
    <item>
      <title>Primitive Shifts: The Harness-as-Primitive Shift</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:04:08 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-harness-as-primitive-shift-143j</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/primitive-shifts-the-harness-as-primitive-shift-143j</guid>
      <description>&lt;h1&gt;
  
  
  Primitive Shifts: The Harness-as-Primitive Shift
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;External Verification Loops Are Becoming Non-Negotiable Infrastructure&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Every few months, the baseline of how AI systems work quietly moves. Engineers who noticed early weren't smarter — they were just paying attention to the right signals. The shift from "AI generates, humans review" to "AI generates within executable constraints" is one of those moves. If your mental model still treats verification as something that happens &lt;em&gt;after&lt;/em&gt; AI output, you're already behind.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is It?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;harness&lt;/strong&gt; is an external verification layer that wraps LLM execution — not prompt engineering, not fine-tuning, but deterministic constraints enforced &lt;em&gt;outside&lt;/em&gt; the model's reasoning loop. The pattern is deceptively simple: the LLM generates output, the harness validates against executable specifications (tests, type checks, physics constraints, domain invariants), a feedback signal loops back, and the LLM iterates until convergence or rejection.&lt;/p&gt;

&lt;p&gt;This inverts the 2023-2024 paradigm where validation happened &lt;em&gt;after&lt;/em&gt; AI output reached humans. Now verification is a &lt;strong&gt;runtime primitive&lt;/strong&gt; that gates AI execution before it ever surfaces.&lt;/p&gt;

&lt;p&gt;The research driving adoption is unambiguous: LLMs cannot reliably self-correct intrinsic reasoning failures without external grounding. The &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Convergent AI Agent Framework (CAAF)&lt;/a&gt; makes this explicit — the "verification gap" is structural, not a capability limitation to be trained away. When an LLM hallucinates incorrect code, no amount of "think step by step" prompting fixes it; only external execution feedback does.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;IACDM methodology&lt;/a&gt; formalizes this as "Interactive Adversarial Convergence" — treating the harness as an adversarial validator that pressure-tests outputs. Production systems like Claude Code's built-in safety checkpoints, detailed in &lt;a href="https://arxiv.org/pdf/2604.14228" rel="noopener noreferrer"&gt;recent architectural analyses&lt;/a&gt;, implement variants of this pattern with execution-based verification loops.&lt;/p&gt;

&lt;p&gt;Here's the mental model shift: the harness isn't scaffolding you remove later. It's the actual product, with the LLM as a component inside it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Flying Under the Radar
&lt;/h2&gt;

&lt;p&gt;Engineers see "add tests" and think they already do this — but harness-as-primitive means tests run &lt;em&gt;during&lt;/em&gt; generation, not after merge. Your CI pipeline catches bugs post-commit; a harness catches them mid-generation, before the code ever exists in your repository. This distinction sounds subtle but changes everything about how AI integrates into development workflows.&lt;/p&gt;

&lt;p&gt;The pattern looks like "just good engineering" rather than a new AI primitive, so it doesn't get labeled or discussed as such. Framework marketing emphasizes agent autonomy and capability benchmarks; verification infrastructure is unglamorous plumbing that doesn't demo well.&lt;/p&gt;

&lt;p&gt;Early adopters discovered it through failure. A &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;2025 study by METR&lt;/a&gt; showed experienced developers using frontier models were &lt;em&gt;measurably slower&lt;/em&gt; despite believing they were faster — the verification gap made them confident and wrong. They trusted model output, shipped bugs, and spent debugging time that exceeded any generation speedup.&lt;/p&gt;

&lt;p&gt;Multi-agent architectures get attention at conferences; single-agent-with-harness quietly outperforms in production. Both &lt;a href="https://cdn.openai.com/pdf/openai-ending-the-capability-overhang.pdf" rel="noopener noreferrer"&gt;OpenAI Codex&lt;/a&gt; and Claude Code run single ReAct loops with heavy external verification, not the multi-agent swarms that dominate research papers.&lt;/p&gt;

&lt;p&gt;The shift is happening inside build systems and CI pipelines, not in prompts or model configs. If you're not touching infrastructure, you're not seeing it happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Try It Today
&lt;/h2&gt;

&lt;p&gt;Here's a production-ready harness implementation that wraps any code generation task with pytest verification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# harness.py - Minimal verification harness for LLM code generation
# Requires: anthropic&amp;gt;=0.34.0, pytest&amp;gt;=8.0.0
# pip install anthropic pytest
&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;MAX_ITERATIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_tests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute pytest against generated code, return (passed, output).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Write the implementation file
&lt;/span&gt;    &lt;span class="n"&gt;impl_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implementation.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;impl_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Write the test file
&lt;/span&gt;    &lt;span class="n"&gt;test_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;test_implementation.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;test_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Run pytest with captured output
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pytest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--tb=short&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;cwd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;# Hard timeout prevents infinite loops
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;returncode&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_with_harness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;test_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;initial_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Generate code that passes tests, iterating until success or budget exhaustion.
    Returns (final_code, iterations_used).
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;current_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;initial_code&lt;/span&gt;
    &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TemporaryDirectory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;temp_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;work_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;MAX_ITERATIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;iteration&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

            &lt;span class="c1"&gt;# First iteration: generate from scratch
&lt;/span&gt;            &lt;span class="c1"&gt;# Subsequent iterations: fix based on test failures
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Write Python code to solve this task:

&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

The code will be tested against these tests:
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
{test_code}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Output ONLY the implementation code, no markdown fencing."""
            else:
                prompt = f"""The following code failed tests:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
python&lt;br&gt;
{current_code}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Test output:
{last_test_output}

Fix the code to pass all tests. Output ONLY the fixed implementation code, no markdown fencing."""

            # Generate candidate solution
            response = client.messages.create(
                model=MODEL,
                max_tokens=2048,
                messages=[{"role": "user", "content": prompt}]
            )

            current_code = response.content[0].text.strip()

            # Strip markdown code fences if model included them anyway
            if current_code.startswith("```

"):
                lines = current_code.split("\n")
                current_code = "\n".join(lines[1:-1])

            # Run harness validation
            passed, last_test_output = run_tests(current_code, test_code, work_dir)

            if passed:
                print(f"✓ Tests passed on iteration {iteration}")
                return current_code, iteration
            else:
                print(f"✗ Iteration {iteration} failed, retrying...")

    # Budget exhausted
    raise RuntimeError(f"Failed to generate passing code after {MAX_ITERATIONS} iterations")

# Example usage
if __name__ == "__main__":
    client = Anthropic()

    # The harness spec (your tests) IS the requirement
    test_code = """
from implementation import merge_sorted_lists

def test_basic_merge():
    assert merge_sorted_lists([1, 3, 5], [2, 4, 6]) == [1, 2, 3, 4, 5, 6]

def test_empty_lists():
    assert merge_sorted_lists([], [1, 2, 3]) == [1, 2, 3]
    assert merge_sorted_lists([1, 2, 3], []) == [1, 2, 3]

def test_duplicates():
    assert merge_sorted_lists([1, 2, 2], [2, 3]) == [1, 2, 2, 2, 3]

def test_single_elements():
    assert merge_sorted_lists([1], [2]) == [1, 2]
"""

    task = "Implement merge_sorted_lists(list1, list2) that merges two sorted lists into one sorted list."

    code, iterations = generate_with_harness(client, task, test_code)
    print(f"\nGenerated in {iterations} iteration(s):\n{code}")


```typescript

For TypeScript projects, apply the same pattern with Zod schema validation as the harness:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
typescript&lt;br&gt;
// harness.ts - Schema validation harness for structured generation&lt;br&gt;
// Requires: &lt;a href="mailto:zod@3.23.0"&gt;zod@3.23.0&lt;/a&gt;, @anthropic-ai/&lt;a href="mailto:sdk@0.30.0"&gt;sdk@0.30.0&lt;/a&gt;&lt;br&gt;
// npm install zod @anthropic-ai/sdk&lt;/p&gt;

&lt;p&gt;import Anthropic from "@anthropic-ai/sdk";&lt;br&gt;
import { z } from "zod";&lt;/p&gt;

&lt;p&gt;// Define your domain schema - this IS your harness&lt;br&gt;
const OrderSchema = z.object({&lt;br&gt;
  orderId: z.string().uuid(),&lt;br&gt;
  customerId: z.string().min(1),&lt;br&gt;
  items: z.array(&lt;br&gt;
    z.object({&lt;br&gt;
      sku: z.string().regex(/^[A-Z]{3}-\d{4}$/),&lt;br&gt;
      quantity: z.number().int().positive(),&lt;br&gt;
      unitPrice: z.number().positive(),&lt;br&gt;
    })&lt;br&gt;
  ).min(1),&lt;br&gt;
  // Domain invariant: total must equal sum of (quantity * unitPrice)&lt;br&gt;
  total: z.number().positive(),&lt;br&gt;
}).refine(&lt;br&gt;
  (order) =&amp;gt; {&lt;br&gt;
    const calculatedTotal = order.items.reduce(&lt;br&gt;
      (sum, item) =&amp;gt; sum + item.quantity * item.unitPrice,&lt;br&gt;
      0&lt;br&gt;
    );&lt;br&gt;
    return Math.abs(order.total - calculatedTotal) &amp;lt; 0.01;&lt;br&gt;
  },&lt;br&gt;
  { message: "Total must equal sum of item prices" }&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;type Order = z.infer;&lt;/p&gt;

&lt;p&gt;const MAX_ITERATIONS = 3;&lt;/p&gt;

&lt;p&gt;async function generateWithSchemaHarness(&lt;br&gt;
  client: Anthropic,&lt;br&gt;
  prompt: string&lt;br&gt;
): Promise {&lt;br&gt;
  let lastError = "";&lt;/p&gt;

&lt;p&gt;for (let i = 0; i &amp;lt; MAX_ITERATIONS; i++) {&lt;br&gt;
    const fullPrompt = lastError&lt;br&gt;
      ? &lt;code&gt;${prompt}\n\nPrevious attempt failed validation: ${lastError}\n\nFix the JSON and try again.&lt;/code&gt;&lt;br&gt;
      : prompt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [{ role: "user", content: fullPrompt }],
});

const text = response.content[0].type === "text" 
  ? response.content[0].text 
  : "";

// Extract JSON from response (handle markdown fencing)
const jsonMatch = text.match(/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
/) || 
                      text.match(/(\{[\s\S]*\})/);

    if (!jsonMatch) {
      lastError = "No valid JSON found in response";
      continue;
    }

    try {
      const parsed = JSON.parse(jsonMatch[1]);
      // Harness validation - schema + domain invariants
      const validated = OrderSchema.parse(parsed);
      console.log(`✓ Validation passed on iteration ${i + 1}`);
      return validated;
    } catch (e) {
      if (e instanceof z.ZodError) {
        lastError = e.errors.map((err) =&amp;gt; 
          `${err.path.join(".")}: ${err.message}`
        ).join("; ");
        console.log(`✗ Iteration ${i + 1}: ${lastError}`);
      } else {
        lastError = `JSON parse error: ${e}`;
      }
    }
  }

  throw new Error(`Failed after ${MAX_ITERATIONS} iterations: ${lastError}`);
}

// Usage
const client = new Anthropic();

generateWithSchemaHarness(
  client,
  `Generate a sample e-commerce order as JSON with:
   - A valid UUID for orderId
   - SKUs in format ABC-1234
   - At least 2 items
   - Correctly calculated total`
).then(console.log);


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight from both implementations: your test suite or schema &lt;strong&gt;is&lt;/strong&gt; the specification. &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;BDD/TDD-first workflows&lt;/a&gt; write Gherkin specs or failing tests &lt;em&gt;before&lt;/em&gt; prompting, treating them as the harness signal rather than human review.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Test coverage becomes AI capability.&lt;/strong&gt; Teams with comprehensive test suites get dramatically better AI output; teams without them hit a ceiling no prompt engineering crosses. This isn't a metaphor — the harness literally cannot validate what you haven't specified. &lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;Research on AI agent failures&lt;/a&gt; shows specification completeness directly correlates with generation success rates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CI/CD pipelines become AI infrastructure.&lt;/strong&gt; Your existing verification tooling (linters, type checkers, integration tests) is now part of your AI system's runtime, not just your human workflow. The &lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up framework&lt;/a&gt; explicitly positions software engineering guardrails as AI-native infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Vibe coding" produces technical debt faster.&lt;/strong&gt; A &lt;a href="https://arxiv.org/html/2603.28592v2" rel="noopener noreferrer"&gt;large-scale empirical study&lt;/a&gt; found AI-generated code without harness validation accumulated 484,366 distinct issues across 302.6k commits — code smells at 89.3%. The speed advantage of AI generation becomes negative if you're generating bugs faster than you fix them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture decision: harness logic belongs in the orchestration layer.&lt;/strong&gt; Separate "what the AI does" from "what constraints it operates under." &lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Recent analysis of agentic systems&lt;/a&gt; argues this separation is essential for maintainability and auditability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human review shifts from gatekeeping to harness design.&lt;/strong&gt; Engineers spend time writing better constraints, not reviewing more AI output. The &lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;Agent Skills specification&lt;/a&gt; assumes skills come with verification criteria — skills without validators are incomplete primitives.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Infrastructure Signal
&lt;/h2&gt;

&lt;p&gt;The convergent evolution tells the story. &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;CAAF&lt;/a&gt;, &lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;IACDM&lt;/a&gt;, the &lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up framework&lt;/a&gt;, and Anthropic's internal practices all independently arrived at "external verification as first-class primitive." When multiple research groups solving different problems converge on the same pattern, it's usually load-bearing.&lt;/p&gt;

&lt;p&gt;The tooling investment pattern is revealing. &lt;a href="https://gist.github.com/spikelab/7551c6368e23caa06a4056350f6b2db3" rel="noopener noreferrer"&gt;Letta's 74% LoCoMo score&lt;/a&gt; came from filesystem-based memory with validation, not sophisticated retrieval — simple harnesses beat complex memory architectures. Platform engineering integration follows: &lt;a href="https://arxiv.org/html/2602.23397v1" rel="noopener noreferrer"&gt;IDPs projected to reach 80% adoption&lt;/a&gt; are natural homes for harness infrastructure, with "golden paths" essentially functioning as pre-validated execution corridors.&lt;/p&gt;

&lt;p&gt;Benchmark evolution provides another signal. &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Terminal-Bench, SWE-bench, and similar evaluations&lt;/a&gt; are &lt;em&gt;harness-native&lt;/em&gt; — they measure agent performance inside verification loops, not raw generation quality. When the benchmarks assume harnesses, the production systems will too.&lt;/p&gt;

&lt;p&gt;The quiet deprecation is already visible in the literature. Prompt-only approaches are being called "&lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;anti-patterns&lt;/a&gt;" in 2025-2026 publications; "unstructured vibe coding" is explicitly positioned as the thing harnesses fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shift Rating
&lt;/h2&gt;

&lt;p&gt;🟢 &lt;strong&gt;Adopt Now&lt;/strong&gt; — Teams without harness infrastructure are already accumulating technical debt faster than they realize. The primitive is production-ready, framework-agnostic, and builds on existing testing/CI investments. The implementation cost is low (you likely have most of the pieces already), and the payoff compounds: better AI output today, less debt tomorrow, and infrastructure that scales as models improve.&lt;/p&gt;

&lt;p&gt;Engineers who internalize "verification is runtime infrastructure, not post-hoc review" will feel the gap close. Those who don't will wonder why their AI tooling plateaued while others kept accelerating.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://gist.github.com/spikelab/7551c6368e23caa06a4056350f6b2db3" rel="noopener noreferrer"&gt;A memory architecture for agentic system · GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16399v2" rel="noopener noreferrer"&gt;Technical Foundation Document IACDM: Interactive Adversarial Convergence Development Methodology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.28592v2" rel="noopener noreferrer"&gt;A Large-Scale Empirical Study of AI-Generated Code in the Wild&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2604.14228" rel="noopener noreferrer"&gt;Dive into Claude Code: The Design Space of Today's and Future AI Coding Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.10599v1" rel="noopener noreferrer"&gt;Rethinking Software Engineering for Agentic AI Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.20436v1" rel="noopener noreferrer"&gt;Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cdn.openai.com/pdf/openai-ending-the-capability-overhang.pdf" rel="noopener noreferrer"&gt;Ending the Capability Overhang - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2601.15195" rel="noopener noreferrer"&gt;Where Do AI Coding Agents Fail? An Empirical Study&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.14805v1" rel="noopener noreferrer"&gt;Knowledge Activation: AI Skills as the Institutional Knowledge Primitive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.23397v1" rel="noopener noreferrer"&gt;Lifecycle-Integrated Security for AI-Cloud Convergence&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://arxiv.org/html/2604.17025v2" rel="noopener noreferrer"&gt;Harness as an Asset: Enforcing Determinism via the Convergent AI Agent Framework&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of **Primitive Shifts&lt;/em&gt;* — a monthly series tracking when new AI building blocks&lt;br&gt;
move from novel experiments to infrastructure you'll be expected to know.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Next MCP Watch series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Spotted a shift happening in your stack? Drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Agentic Memory Systems — From Chaotic Context to Learned Control</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:03:37 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/agentic-memory-systems-from-chaotic-context-to-learned-control-183o</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/agentic-memory-systems-from-chaotic-context-to-learned-control-183o</guid>
      <description>&lt;h1&gt;
  
  
  Agentic Memory Systems — From Chaotic Context to Learned Control
&lt;/h1&gt;

&lt;p&gt;Your agent just failed a customer support escalation because it couldn't remember that this same user had already explained their billing issue twice in previous sessions. The context window filled up with tool calls and intermediate reasoning, and the critical historical context got evicted. This isn't a rare edge case—it's the default failure mode for any agent that runs longer than a single conversation turn. The 2024-era solutions of naive RAG retrieval and sliding window compression treat memory as passive storage, but production agents need something fundamentally different: the ability to &lt;em&gt;decide&lt;/em&gt; what to remember.&lt;/p&gt;

&lt;p&gt;The research wave from early 2026 has crystallized around a compelling answer. Papers on &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;agentic memory architectures&lt;/a&gt; and benchmarks like MemoryArena have demonstrated that treating memory operations as learnable actions—not hardcoded heuristics—recovers 15-25% accuracy on multi-session tasks where even the best models were failing. This shift from "memory as database" to "memory as learned skill" represents the most significant architectural evolution in agent design since tool use became standard.&lt;/p&gt;

&lt;p&gt;This article breaks down the four-memory-type architecture emerging as the production standard and shows you how to implement learned memory policies in LangGraph with the new &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB integration&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four-Memory-Type Architecture for Agents
&lt;/h2&gt;

&lt;p&gt;The cognitive science literature has long distinguished between different memory systems in humans, and this taxonomy turns out to be remarkably useful for agent design. The &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;survey on memory mechanisms for autonomous agents&lt;/a&gt; identifies four distinct memory types that map directly to different operational needs in production systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working memory&lt;/strong&gt; is what the agent is thinking right now—the live reasoning context held in the current LLM call. It's bounded by your context window (128K tokens for Claude, up to 2M with Google's models), and everything flows through it. The critical insight is that working memory isn't just the user's message; it's the curated subset of all other memory types that's been loaded for this specific reasoning step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Episodic memory&lt;/strong&gt; stores timestamped records of specific interactions and events. When a user asks "what did we discuss last week about the API migration?" the answer lives in episodic memory. Each episode captures not just what was said, but the outcome—did the user seem satisfied? Did the suggested solution work? This outcome tracking is what enables learning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic memory&lt;/strong&gt; contains consolidated facts and rules extracted from episodes. If a customer support agent handles fifty return requests, the episodes are individual conversations, but the semantic memory extracts "customers mentioning 'damaged in shipping' are eligible for express replacement without requiring photos." This generalization is what prevents agents from repeatedly discovering the same patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedural memory&lt;/strong&gt; stores action sequences and workflows as reusable routines. When an agent learns that processing a refund requires checking order status, then verifying payment method, then initiating the return, this becomes procedural knowledge that can be invoked without re-reasoning from first principles.&lt;/p&gt;

&lt;p&gt;The interaction patterns matter as much as the types themselves. Episodic memory consolidates into semantic memory through generalization—after enough similar episodes, a pattern becomes a fact. Procedural and semantic memory load into working memory during task execution, providing the context needed for reasoning. The &lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;architectural taxonomies&lt;/a&gt; emerging in the literature consistently show this hierarchical flow: episodes → facts → working context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memory Type&lt;/th&gt;
&lt;th&gt;Persistence&lt;/th&gt;
&lt;th&gt;Update Frequency&lt;/th&gt;
&lt;th&gt;Typical Backend&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Single call&lt;/td&gt;
&lt;td&gt;Every token&lt;/td&gt;
&lt;td&gt;LLM context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Per interaction&lt;/td&gt;
&lt;td&gt;Document store, MongoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Periodic consolidation&lt;/td&gt;
&lt;td&gt;Vector store, graph DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Procedural&lt;/td&gt;
&lt;td&gt;Long-term&lt;/td&gt;
&lt;td&gt;Rare refinement&lt;/td&gt;
&lt;td&gt;Code/config, document store&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  From Passive Storage to Learned Memory Policies
&lt;/h2&gt;

&lt;p&gt;The traditional approach to agent memory is entirely heuristic. Summarize every N turns. Retrieve the top-K similar chunks. Compress anything older than M messages. These rules are easy to implement and easy to reason about, but they fail at the edges where production systems actually live.&lt;/p&gt;

&lt;p&gt;Over-summarization loses critical detail. A summary that says "user discussed billing issues" isn't useful when the specific detail was that the user's card was charged twice on March 3rd for transaction ID 4829. Under-retrieval causes agents to repeat mistakes or ask users to re-explain problems they've already described. The heuristics don't know what matters for the current task.&lt;/p&gt;

&lt;p&gt;The breakthrough in the &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;agentic memory research&lt;/a&gt; is treating memory operations—store, retrieve, consolidate, forget—as actions in a reinforcement learning framework. Instead of hardcoding "summarize every 10 turns," you train the agent to decide when summarization helps and when it hurts. The training signal comes from downstream task success: did remembering this detail lead to a correct answer? Did consolidating those episodes produce a useful generalization?&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/Shichun-Liu/Agent-Memory-Paper-List" rel="noopener noreferrer"&gt;agent memory paper list&lt;/a&gt; catalogs the rapid evolution of these techniques. Step-wise policy gradient methods like GRPO allow fine-grained credit assignment—which specific memory decision contributed to the final outcome? This is fundamentally different from end-to-end training because memory decisions have delayed effects; storing something now might only prove useful three sessions later.&lt;/p&gt;

&lt;p&gt;Benchmark results from MemoryArena illustrate the gap. Models that achieve near-perfect scores on single-session long-context tasks (LoCoMo-style benchmarks) drop to 40-60% accuracy on multi-session tasks with interdependencies. The context window is long enough, but the agent can't figure out what to load from history. Learned memory policies recover 15-25% of this accuracy gap—not by expanding context, but by making smarter decisions about what goes into it.&lt;/p&gt;

&lt;p&gt;The operational gotcha is that learned policies require task-specific fine-tuning. An off-the-shelf model won't magically know what to remember for your customer support workflow versus your code review assistant. Until you've collected enough trajectories to train on, you need explicit memory scaffolding—which brings us to implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On: Code Walkthrough
&lt;/h2&gt;

&lt;p&gt;We'll build a LangGraph agent that maintains episodic and semantic memory across sessions using MongoDB as the backend. This architecture leverages the &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB integration&lt;/a&gt; announced for production agent deployments. The goal is a working memory system you can deploy today with heuristic policies, structured for easy upgrade to learned policies later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.mongodb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoDBSaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Define memory schemas with Pydantic
# These schemas determine what we track in each memory type
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A single interaction event with full context and outcome.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# Natural language summary of what happened
&lt;/span&gt;    &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Extracted entities (names, IDs, topics)
&lt;/span&gt;    &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# The original user input
&lt;/span&gt;    &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# What the agent said
&lt;/span&gt;    &lt;span class="n"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# success, failure, unknown
&lt;/span&gt;    &lt;span class="n"&gt;outcome_signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# Numeric reward for RL training
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A consolidated fact extracted from one or more episodes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;  &lt;span class="c1"&gt;# The actual fact, e.g., "User prefers email over phone"
&lt;/span&gt;    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# How certain we are
&lt;/span&gt;    &lt;span class="n"&gt;source_episode_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# Provenance for debugging
&lt;/span&gt;    &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;last_used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# For LRU-style eviction
&lt;/span&gt;    &lt;span class="n"&gt;use_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Track utility for learned policies
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The state object passed through the LangGraph nodes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;  &lt;span class="c1"&gt;# Conversation history (working memory)
&lt;/span&gt;    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;retrieved_facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;memory_action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# What the controller decided
&lt;/span&gt;    &lt;span class="n"&gt;consolidation_pending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Memory storage layer using MongoDB
# Separate collections for episodes and semantic facts
&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;episodes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_facts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="c1"&gt;# Create indexes for efficient queries
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist an episode to MongoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;episode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve relevant episodes for a user.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;days_back&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$in&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;count_recent_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Count episodes mentioning an entity—used for consolidation trigger.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_documents&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_semantic_fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist a semantic fact to MongoDB.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_semantic_facts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve high-confidence facts for context loading.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;semantic_facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$gte&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Define the memory controller node
# This is where heuristic policy lives—replace with learned policy later
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;memory_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Decide which memory operation to perform based on current state.

    Heuristic policy (v1):
    - Always retrieve relevant episodes and facts before reasoning
    - Store episode after each user turn
    - Trigger consolidation when 5+ episodes share an entity
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract entities from the last user message (simplified)
&lt;/span&gt;    &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="c1"&gt;# In production, use NER or LLM extraction here
&lt;/span&gt;    &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve relevant context
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_facts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_semantic_facts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;min_confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check if consolidation should trigger
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count_recent_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidate_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Placeholder entity extraction—use spaCy or LLM in production.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Very simplified: extract capitalized words and common patterns
&lt;/span&gt;    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
    &lt;span class="n"&gt;words&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\b[A-Z][a-z]+\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;))[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Step 4: Build the reasoning node that uses memory context
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reasoning_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Main reasoning with memory-augmented context.
    Loads retrieved episodes and facts into working memory.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Build system prompt with memory context
&lt;/span&gt;    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_episodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;retrieved_facts&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;system_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant with access to memory of past interactions.

RELEVANT HISTORY:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Use this context to provide personalized, consistent responses. 
Reference past interactions when relevant.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Call the LLM with augmented context
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;system_msg&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create episode record for this interaction
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asked about: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;extract_entities_simple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add response to conversation
&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AIMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Episode&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Format retrieved memories for inclusion in prompt.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KNOWN FACTS ABOUT THIS USER:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;RECENT RELEVANT INTERACTIONS:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;date_str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;date_str&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No relevant history found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 5: Consolidation node—extracts semantic facts from episodes
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consolidation_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Consolidate multiple episodes into semantic facts.
    This is where episodic → semantic generalization happens.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Get episodes to consolidate
&lt;/span&gt;    &lt;span class="n"&gt;entity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidate_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query_episodes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

    &lt;span class="c1"&gt;# Use LLM to extract generalizable facts
&lt;/span&gt;    &lt;span class="n"&gt;episode_summaries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;extraction_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on these past interactions, extract 1-3 general facts about the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s preferences, patterns, or needs:

INTERACTIONS:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;episode_summaries&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Output each fact on its own line, starting with &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
Only include facts that appear consistently across multiple interactions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;extraction_prompt&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse and store extracted facts
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;fact_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FACT:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;fact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fact_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Adjust based on episode count
&lt;/span&gt;                &lt;span class="n"&gt;source_episode_ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ep&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ep&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_semantic_fact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fact&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Step 6: Wire everything into a LangGraph StateGraph
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_memory_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;anthropic_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Construct the full memory-enabled agent graph.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize components
&lt;/span&gt;    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;anthropic_api_key&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MongoDBSaver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_conn_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mongo_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Define the graph
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add nodes with bound dependencies
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;memory_controller&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;reasoning_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;store_and_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;consolidation_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Define edges
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_controller&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Conditional edge: consolidate if pending, else end
&lt;/span&gt;    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;store_episode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;consolidation_pending&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;consolidation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_and_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Persist the current episode and return state.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_episode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_episode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;

&lt;span class="c1"&gt;# Usage example with LangSmith tracing
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langsmith&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceable&lt;/span&gt;

&lt;span class="nd"&gt;@traceable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_agent_turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_turn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute a single turn with full tracing.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;initial_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;initial_state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extension point for learned policies is the &lt;code&gt;memory_controller&lt;/code&gt; function. Replace the heuristic rules with a fine-tuned classifier that takes the current state and predicts the optimal memory action. The &lt;a href="https://huggingface.co/blog/aufklarer/ai-trends-2026-test-time-reasoning-reflective-agen" rel="noopener noreferrer"&gt;GRPO training approach&lt;/a&gt; mentioned in the research uses trajectories where you label which memory decisions led to successful task completion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarking Your Agent's Memory: MemoryArena and MemBench
&lt;/h2&gt;

&lt;p&gt;Standard benchmarks for long-context models miss the critical challenge in production agents. LoCoMo, LongBench, and similar evaluations test single-session performance—can the model find a needle in a haystack within one context window? But your production agent runs across dozens of sessions over weeks or months. The &lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;survey on memory evaluation&lt;/a&gt; identifies this gap as a primary reason deployed agents underperform their benchmark scores.&lt;/p&gt;

&lt;p&gt;MemoryArena addresses this with four domains specifically designed for multi-session evaluation: customer support (returning users with ongoing issues), project management (tasks that span days with status updates), personal assistant (preference learning over time), and collaborative coding (incremental feature development). Tasks span 5-20 sessions with explicit interdependencies—session 7 might require information from session 2 that wasn't relevant in sessions 3-6.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;agentic AI architectures survey&lt;/a&gt; highlights five dimensions for memory evaluation that you should track in your own systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retention accuracy&lt;/strong&gt;: Does the agent remember critical facts after they leave the context window?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval precision&lt;/strong&gt;: When memory is loaded, is it actually relevant to the current query?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation quality&lt;/strong&gt;: Do extracted semantic facts accurately generalize from episodes?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interference resistance&lt;/strong&gt;: Does learning new information corrupt existing memories?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting appropriateness&lt;/strong&gt;: Does the agent correctly discard outdated or superseded information?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For practical measurement in LangSmith, instrument your agent with these metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory hit rate&lt;/strong&gt;: Of retrieved memories, what percentage appeared in the final response or reasoning trace? Track this with metadata tags on your traces.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consolidation ratio&lt;/strong&gt;: Episodes created vs. semantic facts extracted. A ratio of 5:1 (5 episodes per fact) suggests healthy generalization; 2:1 might indicate overfitting to specific instances.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory bloat&lt;/strong&gt;: Total storage growth per active user per week. Unbounded growth signals missing TTL policies or over-storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To create your own MemoryArena-style evaluation, export multi-session conversation logs from your production system, annotate them with ground-truth "should remember" and "should retrieve" labels, and compare agent performance with memory enabled versus disabled. The &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents&lt;/a&gt; collection includes several evaluation harnesses you can adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Immediate adoption (this week)&lt;/strong&gt;: Add structured episodic logging to your existing agents. Even without learned policies, queryable history improves debugging when something goes wrong and increases user trust when the agent demonstrates continuity. The code above gives you a working MongoDB-backed episodic store you can deploy today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Medium-term investment (next quarter)&lt;/strong&gt;: Implement the four-memory-type separation. Use MongoDB or PostgreSQL with JSON columns for episodic storage—the &lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;LangChain + MongoDB partnership&lt;/a&gt; provides native integration. Add a vector store (Pinecone, Weaviate, or MongoDB Atlas Vector Search) for semantic retrieval. The investment pays off in personalization quality and reduced user friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advanced path (6+ months)&lt;/strong&gt;: Fine-tune a memory controller on your domain using GRPO or DPO. This requires collecting trajectories with labeled outcomes—which memory decisions led to task success? The &lt;a href="https://arxiv.org/html/2602.23720v1" rel="noopener noreferrer"&gt;emerging frameworks like Auton&lt;/a&gt; provide scaffolding for this training loop, but expect to invest in custom data collection infrastructure.&lt;/p&gt;

&lt;p&gt;One critical architecture decision: should memory consolidation run inline (during the conversation) or as a background job? Inline consolidation adds latency—100-500ms for an LLM call to extract facts—but keeps memory fresh. Background batch processing adds staleness (facts extracted hours after the relevant episodes) but maintains conversation responsiveness. For most applications, background consolidation with aggressive episode retrieval is the right trade-off.&lt;/p&gt;

&lt;p&gt;Operational considerations you'll hit in production: memory storage grows unboundedly without intervention. Implement TTL policies (archive episodes older than 90 days to cold storage), user-scoped isolation (critical for multi-tenant systems), and GDPR-compliant deletion hooks (when a user requests data deletion, you need to cascade through episodes, facts, and any derived embeddings).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering" rel="noopener noreferrer"&gt;agentic engineering practices&lt;/a&gt; emerging in production teams emphasize that memory systems are infrastructure, not features. Budget for them accordingly—monitoring, alerting on memory bloat, and regular audits of consolidation quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Build This Week
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Project: Memory-Enabled Support Agent with Consolidation Dashboard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Build a customer support agent that remembers user preferences and issue history across sessions, with a Streamlit dashboard showing memory operations in real-time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy the MongoDB-backed memory system from the code walkthrough&lt;/li&gt;
&lt;li&gt;Create a simple support chat interface (Gradio or Streamlit)&lt;/li&gt;
&lt;li&gt;Simulate 10 multi-turn conversations with 3 different "users," each discussing a recurring topic (billing, technical issues, feature requests)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build a dashboard that displays:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Episode timeline per user&lt;/li&gt;
&lt;li&gt;Extracted semantic facts with source episode links&lt;/li&gt;
&lt;li&gt;Memory hit rate per conversation (did retrieved memories appear in responses?)&lt;/li&gt;
&lt;li&gt;Consolidation triggers (when and why facts were extracted)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run a comparison: disable memory retrieval for half the conversations and measure how often the agent asks users to repeat information they've already provided&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The success metric is demonstrating that memory-enabled conversations require fewer clarifying questions and produce more personalized responses. Post your results with LangSmith trace links—the community needs more real-world data on memory system performance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/blog/announcing-the-langchain-mongodb-partnership-the-ai-agent-stack-that-runs-on-the-database-you-already-trust" rel="noopener noreferrer"&gt;Announcing the LangChain + MongoDB Partnership: The AI Agent Stack That Runs On The Database You Already Trust&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2603.07670v1" rel="noopener noreferrer"&gt;Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Shichun-Liu/Agent-Memory-Paper-List" rel="noopener noreferrer"&gt;The paper list of "Memory in the Age of AI Agents: A Survey"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/aufklarer/ai-trends-2026-test-time-reasoning-reflective-agen" rel="noopener noreferrer"&gt;AI Trends 2026: Test-Time Reasoning and the Rise of Reflective Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2601.12560" rel="noopener noreferrer"&gt;Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Future Directions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2602.23720v1" rel="noopener noreferrer"&gt;The Auton Agentic AI Framework: A Declarative Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://www.langchain.com/blog/agentic-engineering-redefining-software-engineering" rel="noopener noreferrer"&gt;How Swarms of AI Agents Are Redefining Software Engineering&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part of the **Agentic Engineering Weekly&lt;/em&gt;* series — a deep-dive every Monday into the frameworks,&lt;br&gt;
patterns, and techniques shaping the next generation of AI systems.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow the Agentic Engineering Weekly series on Dev.to to catch every edition.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Building something agentic? Drop a comment — I'd love to feature reader projects.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI Infrastructure Strains Under Demand as OpenAI Ships GPT-5.5 and Multi-Agent Systems Go Mainstream</title>
      <dc:creator>Richard Dillon</dc:creator>
      <pubDate>Mon, 04 May 2026 12:03:05 +0000</pubDate>
      <link>https://dev.to/richard_dillon_b9c238186e/ai-infrastructure-strains-under-demand-as-openai-ships-gpt-55-and-multi-agent-systems-go-mainstream-3k35</link>
      <guid>https://dev.to/richard_dillon_b9c238186e/ai-infrastructure-strains-under-demand-as-openai-ships-gpt-55-and-multi-agent-systems-go-mainstream-3k35</guid>
      <description>&lt;h1&gt;
  
  
  AI Infrastructure Strains Under Demand as OpenAI Ships GPT-5.5 and Multi-Agent Systems Go Mainstream
&lt;/h1&gt;

&lt;p&gt;The AI industry is experiencing a fascinating inflection point this week: while chipmakers struggle to meet insatiable demand and Goldman Sachs sounds alarms about long-term market disruption, the technology itself continues its relentless march forward. OpenAI's GPT-5.5 brings enhanced agentic capabilities, interpretable architectures are emerging from stealth, and multi-agent systems are finally transitioning from research curiosity to production necessity. The infrastructure can barely keep up—and that tension is reshaping both the semiconductor industry and investment strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intel CPU Demand Surges as AI Boom Reaches Central Processors
&lt;/h2&gt;

&lt;p&gt;Intel is having a moment. The company's stock &lt;a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/" rel="noopener noreferrer"&gt;hit record highs&lt;/a&gt; this week as AI service providers drove unprecedented demand for traditional CPUs, signaling a significant shift in how the industry thinks about AI infrastructure requirements.&lt;/p&gt;

&lt;p&gt;The numbers tell a compelling story: Q1 demand was so strong that Intel sold through chips originally reserved for other purposes, a remarkable turnaround for a company that spent years watching NVIDIA dominate AI compute headlines. This isn't Intel suddenly competing in the GPU space—it's the AI workload profile evolving to require more heterogeneous compute.&lt;/p&gt;

&lt;p&gt;The surge makes architectural sense. As AI deployments move from training-focused research environments to inference-heavy production systems, the computational mix changes. Retrieval-augmented generation pipelines, vector database queries, orchestration layers for multi-agent systems, and pre/post-processing stages all lean heavily on CPU performance. A single AI service might use GPUs for model inference while relying on dozens of CPU cores for everything surrounding that inference.&lt;/p&gt;

&lt;p&gt;This follows Intel's &lt;a href="https://www.reuters.com/business/google-puts-ai-agents-heart-its-enterprise-money-making-push-2026-04-22/" rel="noopener noreferrer"&gt;partnership with Google&lt;/a&gt; on AI-optimized CPUs announced earlier this year, suggesting the demand spike isn't purely organic but reflects strategic positioning that's now paying dividends. The question is whether Intel can sustain this momentum as competitors adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Samsung Chip Profits Jump 50x Amid AI-Driven Supply Crunch
&lt;/h2&gt;

&lt;p&gt;If Intel's surge represents demand reaching CPUs, Samsung's numbers represent the raw magnitude of that demand across the entire semiconductor stack. The company's semiconductor profits jumped &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;nearly 50-fold&lt;/a&gt; on AI chip demand—a staggering figure that underscores just how supply-constrained the industry remains.&lt;/p&gt;

&lt;p&gt;More concerning for AI builders: Samsung executives warned the supply shortage will &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;worsen through 2027&lt;/a&gt;. That's not a quarter or two of tightness—it's a multi-year structural constraint that will force hard prioritization decisions about which AI projects get built and which wait for silicon.&lt;/p&gt;

&lt;p&gt;The bottleneck extends beyond any single manufacturer. Cerebras is also &lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;targeting AI chip market expansion&lt;/a&gt;, and every major hyperscaler has custom silicon programs in various stages of deployment. Yet demand continues to outpace supply additions.&lt;/p&gt;

&lt;p&gt;For engineering teams, this has practical implications. Reserved capacity agreements, longer hardware procurement timelines, and more aggressive optimization to extract maximum utility from existing infrastructure are becoming standard practice. The companies that locked in capacity contracts 18 months ago are looking prescient; those assuming spot availability are scrambling. Cloud costs reflect this reality, with GPU instance prices remaining stubbornly high despite efficiency improvements in model inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Goldman Sachs Warns AI Disruption Threatens Long-Term US Equity Valuations
&lt;/h2&gt;

&lt;p&gt;While chipmakers celebrate demand, Goldman Sachs is &lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;raising concerns&lt;/a&gt; about what that AI adoption means for the broader market. The investment bank's analysis suggests AI's potential to disrupt existing business models creates unprecedented uncertainty in long-term equity valuations.&lt;/p&gt;

&lt;p&gt;The argument isn't that AI is bad for the economy—quite the opposite. It's that traditional valuation frameworks assume reasonable continuity in competitive dynamics, and AI capabilities are advancing fast enough to invalidate those assumptions. A company's moat today might be worthless if an AI system can replicate its core competency tomorrow.&lt;/p&gt;

&lt;p&gt;This creates a valuation puzzle. How do you price a professional services firm when GPT-5.5 can handle increasing portions of its workflows? What's the appropriate multiple for a software company whose product might be replaced by an AI agent? Goldman's analysts argue investors are &lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;rethinking traditional valuation approaches&lt;/a&gt; for companies with significant AI exposure—both positive and negative.&lt;/p&gt;

&lt;p&gt;Reuters' parallel &lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;analysis of AI business model reliability&lt;/a&gt; adds context: many AI-native companies themselves have unproven unit economics, making the disruption a two-way uncertainty. The market is effectively pricing both disruption risk for incumbents and execution risk for disruptors simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Unveils GPT-5.5 with Enhanced Cyber Capabilities and Expanded Access
&lt;/h2&gt;

&lt;p&gt;OpenAI's &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;GPT-5.5 release&lt;/a&gt; this week represents the company's most significant push into agentic and cyber-specific capabilities. The model scored &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;84.9% on the GDPval benchmark&lt;/a&gt;, which tests agent performance across 44 distinct occupations—a notable jump that positions it as the current leader in generalist agent capability.&lt;/p&gt;

&lt;p&gt;The cyber focus deserves particular attention. Building on the GPT-5.2 security framework, GPT-5.5 introduces &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;cyber-specific safeguards&lt;/a&gt; designed to prevent misuse while enabling legitimate security research and defense applications. This includes improved jailbreak resistance for security-adjacent prompts and better detection of social engineering attempts that try to extract offensive capabilities.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Trusted Access for Cyber program&lt;/a&gt; expands access to advanced cybersecurity capabilities for vetted organizations. Critical infrastructure defenders can apply for what OpenAI calls "cyber-permissive model access"—essentially a less restricted version of the model for organizations that can demonstrate legitimate defensive needs and accept strict usage requirements.&lt;/p&gt;

&lt;p&gt;This tiered access approach represents OpenAI's attempt to thread the needle between capability and responsibility. The most powerful features are gated behind verification processes, while the broadly available model maintains stronger guardrails. Whether this satisfies critics who want either more restriction or more openness remains to be seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Guide Labs Introduces Interpretable LLM Architecture
&lt;/h2&gt;

&lt;p&gt;In a space dominated by scale-focused competition, &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;Guide Labs debuted&lt;/a&gt; a fundamentally different approach: an LLM architecture built from the ground up for transparency and explainability. The startup's design prioritizes interpretability as a first-class architectural concern rather than a post-hoc analysis layer.&lt;/p&gt;

&lt;p&gt;The timing is strategic. Enterprise buyers increasingly demand AI systems they can audit, understand, and explain to regulators. The EU AI Act's requirements for high-risk applications are pushing organizations toward solutions that offer more than black-box predictions with confidence scores. Guide Labs is betting that some enterprises will accept capability tradeoffs for genuine interpretability.&lt;/p&gt;

&lt;p&gt;The architecture apparently uses &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;hybrid approaches&lt;/a&gt; that combine neural components with more structured, inspectable reasoning modules. While specifics remain limited—the company is still in controlled access—early descriptions suggest something closer to neurosymbolic systems than pure transformer scaling.&lt;/p&gt;

&lt;p&gt;This represents an emerging trend toward &lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;architectures balancing capability with transparency&lt;/a&gt;. The massive foundation model players are unlikely to pivot away from scale, but a market segment is developing for interpretable alternatives in regulated industries. Healthcare, finance, and government applications where audit requirements are non-negotiable may find Guide Labs' approach compelling regardless of raw benchmark performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic Programming Updates
&lt;/h2&gt;

&lt;p&gt;The shift to multi-agent systems has officially moved from experimental to expected. UiPath's 2026 report declares that &lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;"solo agents are out"&lt;/a&gt;—a stark signal that enterprise automation is embracing coordination complexity as the default approach rather than an advanced option.&lt;/p&gt;

&lt;p&gt;New coordination patterns are crystallizing around practical problems. &lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;Task graphs, shared vs. isolated context management, and merge strategies&lt;/a&gt; for handling simultaneous agent commits are becoming standard architectural considerations. The parallel to distributed systems design is intentional and useful: many patterns from microservices and distributed databases translate surprisingly well to multi-agent orchestration.&lt;/p&gt;

&lt;p&gt;The framework landscape is consolidating around developer experience. &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;PydanticAI offers a FastAPI-style approach&lt;/a&gt; that will feel immediately familiar to Python developers—type hints, dependency injection, and minimal boilerplate. &lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Modus takes a different path&lt;/a&gt; with serverless WebAssembly agents that promise minimal cold starts, targeting use cases where latency sensitivity outweighs raw capability.&lt;/p&gt;

&lt;p&gt;The academic community is formalizing best practices. The &lt;a href="https://arxiv.org/html/2511.17332" rel="noopener noreferrer"&gt;AAAI 2026 Bridge Program&lt;/a&gt; highlighted the need for mechanism design principles in multi-agent systems—specifically around modeling preferences, incentives, and interaction rules. This matters because agents that work perfectly in isolation can produce adversarial or degenerate behavior when combined without careful incentive alignment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;Durable agent jobs&lt;/a&gt; enabling long-running workflows with state persistence across sessions are addressing one of the thorniest practical challenges. And &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;Open-AutoGLM&lt;/a&gt; has emerged as a credible open-source option for mobile device automation, reducing dependency on proprietary mobile agent frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Education Sector Embraces Multi-Agent AI Architecture
&lt;/h2&gt;

&lt;p&gt;The education sector is providing an interesting case study in multi-agent deployment at scale. The &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;Agentic Unified Student Support System (AUSS)&lt;/a&gt; demonstrates what happens when you apply multi-agent architecture to a traditionally fragmented problem space.&lt;/p&gt;

&lt;p&gt;AUSS integrates three tiers of specialized agents: student-level for personalized support, educator-level for teaching assistance, and institutional-level for administrative optimization. The reported metrics are impressive: &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;92.4% recommendation accuracy, 94.1% grading efficiency, and 89.5% F1-score&lt;/a&gt; on dropout prediction. These aren't cherry-picked benchmarks—dropout prediction in particular is a notoriously noisy classification problem.&lt;/p&gt;

&lt;p&gt;The technical stack is notably heterogeneous. The system &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;combines LLMs, reinforcement learning, predictive analytics, and rule-based reasoning&lt;/a&gt; rather than forcing everything through a single model architecture. This hybrid approach allows different agent types to use the most appropriate technique for their specific task while sharing information through unified interfaces.&lt;/p&gt;

&lt;p&gt;The design directly addresses what the researchers identify as &lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;fragmentation in existing AI educational tools&lt;/a&gt;. Previous approaches treated tutoring, assessment, and administration as separate AI problems with separate systems. AUSS demonstrates that meaningful improvements come from agents that share context—a student's learning patterns inform grading feedback which influences dropout risk assessment in a continuous loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  DeepTest 2026 Competition Benchmarks LLM Safety in Automotive Systems
&lt;/h2&gt;

&lt;p&gt;As AI systems deploy in safety-critical domains, testing methodologies struggle to keep pace. The &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;DeepTest 2026 competition&lt;/a&gt; tackled this directly, challenging four teams to test in-car voice assistant safety using LLM-based test generators.&lt;/p&gt;

&lt;p&gt;The competing tools—&lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;ATLAS, Exida Test Generator, Warnless, and CRISP&lt;/a&gt;—represent different approaches to generating adversarial inputs for automotive AI testing. The goal isn't to break the systems for its own sake but to find failure modes before they occur in production with real drivers.&lt;/p&gt;

&lt;p&gt;The competition used &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;GPT-4o-Mini as an evaluation oracle&lt;/a&gt;, achieving an F1-score of 0.824 at a cost of $0.20 per 1000 requests. This pragmatic choice reflects the reality that human evaluation doesn't scale for automated testing pipelines, but current models can serve as reasonable proxies for detecting safety-relevant failures.&lt;/p&gt;

&lt;p&gt;The competition highlights a &lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;growing focus on safety testing methodologies&lt;/a&gt; for deployed AI systems. Automotive is just one domain—similar challenges exist in healthcare, finance, and any application where AI errors have serious consequences. The tools developed here will likely influence testing approaches across industries as regulatory requirements for AI safety assurance mature.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;The infrastructure constraints won't resolve quickly, so expect continued pressure on AI project timelines and costs through 2027. OpenAI's tiered access model for GPT-5.5 may become the template for capability governance industry-wide. And as multi-agent systems hit production, the failure modes will get interesting—watch for the first major incident involving emergent multi-agent behavior that nobody explicitly designed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/ai-fears-drive-us-stock-investors-rethink-long-term-growth-bets-says-goldman-2026-04-28/" rel="noopener noreferrer"&gt;AI disruption puts focus on long-term value of US equities, Goldman ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/technology/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News | Latest Headlines and Developments | Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/google-puts-ai-agents-heart-its-enterprise-money-making-push-2026-04-22/" rel="noopener noreferrer"&gt;Google puts AI agents at heart of its enterprise money-making push&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/" rel="noopener noreferrer"&gt;Intel soars on signs AI boom for CPUs is here - Reuters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/category/artificial-intelligence/" rel="noopener noreferrer"&gt;AI News &amp;amp; Artificial Intelligence | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ARUNAGIRINATHAN-K/awesome-ai-agents" rel="noopener noreferrer"&gt;Awesome AI Agents for 2026 - GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/daya-shankar/agentic-ai-trends-2026" rel="noopener noreferrer"&gt;Latest Agentic AI Trends to Watch in 2026: Market Shifts, Adoption ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2511.17332" rel="noopener noreferrer"&gt;AAAI 2026 Bridge Program on Advancing LLM-Based Multi-Agent ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.16566v1" rel="noopener noreferrer"&gt;Agentic AI for Education: A Unified Multi-Agent Framework for ... - arXiv&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/Svngoku/agentic-coding-trends-2026" rel="noopener noreferrer"&gt;2026 Agentic Coding Trends - Implementation Guide (Technical)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/html/2604.12615v1" rel="noopener noreferrer"&gt;DeepTest Tool Competition 2026: Benchmarking an LLM-Based ...&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/" rel="noopener noreferrer"&gt;Guide Labs debuts a new kind of interpretable LLM | TechCrunch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-5/" rel="noopener noreferrer"&gt;Introducing GPT-5.5 - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;a href="https://github.com/caramaschiHG/awesome-ai-agents-2026" rel="noopener noreferrer"&gt;caramaschiHG/awesome-ai-agents-2026: The most comprehensive ...&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Enjoyed this briefing? Follow this series for a fresh AI update every week, written for engineers who want to stay ahead.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow this publication on Dev.to to get notified of every new article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have a story tip or correction? Drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>technology</category>
    </item>
  </channel>
</rss>
