<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SchrodingCatAI</title>
    <description>The latest articles on DEV Community by SchrodingCatAI (@schrodingcatai).</description>
    <link>https://dev.to/schrodingcatai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3972541%2Fb35c0fd1-a379-407c-8f14-6f828c5541e7.png</url>
      <title>DEV Community: SchrodingCatAI</title>
      <link>https://dev.to/schrodingcatai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/schrodingcatai"/>
    <language>en</language>
    <item>
      <title>【Deep Analysis】Microsoft Copilot Cowork, DeepSeek, and the Enterprise AI Agent Stack</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Fri, 19 Jun 2026 14:31:24 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/deep-analysis-microsoft-copilot-cowork-deepseek-and-the-enterprise-ai-agent-stack-5f4l</link>
      <guid>https://dev.to/schrodingcatai/deep-analysis-microsoft-copilot-cowork-deepseek-and-the-enterprise-ai-agent-stack-5f4l</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;Microsoft Copilot Cowork is moving from simple AI assistance toward cloud-based enterprise agents that can execute long-running work, call tools, retrieve company data, and operate across files. This article explains its architecture, usage-based billing logic, multi-model strategy, DeepSeek implications, Web IQ grounding layer, and a practical Python example for building an agent-style task cost estimator.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Background: Why Copilot Cowork Matters
&lt;/h2&gt;

&lt;p&gt;Microsoft Copilot Cowork is not a normal chatbot interface. Traditional Copilot chat is designed around short interactions: ask a question, summarize a meeting, draft an email, or rewrite a document. Copilot Cowork targets a different workload category: long-running enterprise tasks.&lt;/p&gt;

&lt;p&gt;In this model, the AI system can receive a business objective, decompose it into steps, retrieve relevant context, call tools, inspect files, generate outputs, validate intermediate results, and continue running in the cloud until the work is complete.&lt;/p&gt;

&lt;p&gt;Typical enterprise scenarios include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comparing thousands of files across product versions&lt;/li&gt;
&lt;li&gt;Editing spreadsheets and generating dependency charts&lt;/li&gt;
&lt;li&gt;Analyzing sales pipeline risks&lt;/li&gt;
&lt;li&gt;Pulling information from internal business systems&lt;/li&gt;
&lt;li&gt;Generating reports from structured and unstructured data&lt;/li&gt;
&lt;li&gt;Running repeatable workflows with audit and compliance requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft has stated that Copilot Cowork is generally available worldwide after a preview period in its Frontier program. During that preview, more than half of the Fortune 500 reportedly used it. This adoption signal is important because enterprise AI is shifting from “answer generation” to “task execution.”&lt;/p&gt;

&lt;p&gt;The more interesting part is the reported possibility that DeepSeek may become an optional model inside Microsoft’s enterprise Copilot ecosystem. If accurate, this would show that enterprise AI platforms are becoming multi-model, cost-sensitive, and geopolitically complex.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Core Principles: How Enterprise AI Agents Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 From Prompt-Response to Agentic Execution
&lt;/h3&gt;

&lt;p&gt;A normal LLM request usually follows a simple pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User prompt -&amp;gt; Model inference -&amp;gt; Final answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An agentic workflow is more complex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task objective
-&amp;gt; Planning
-&amp;gt; Context retrieval
-&amp;gt; Tool selection
-&amp;gt; Model calls
-&amp;gt; File operations
-&amp;gt; Verification
-&amp;gt; Output generation
-&amp;gt; Optional retry or correction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This architecture is more powerful, but it also consumes more compute. A single business task may trigger dozens of model calls, multiple retrieval operations, and several tool executions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Why Usage-Based Billing Becomes Necessary
&lt;/h3&gt;

&lt;p&gt;Unlimited use is difficult to sustain for agentic AI because productive users generate high compute load. A user who runs hundreds of tasks per week may create significant inference, retrieval, orchestration, and runtime costs.&lt;/p&gt;

&lt;p&gt;Copilot Cowork therefore uses a usage-based model measured in Copilot credits. Task price depends on factors such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model usage&lt;/li&gt;
&lt;li&gt;Context size&lt;/li&gt;
&lt;li&gt;Retrieval workload&lt;/li&gt;
&lt;li&gt;Tool calls&lt;/li&gt;
&lt;li&gt;Runtime duration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At general availability, Microsoft described pay-as-you-go pricing based on Copilot credits, with committed usage options for customers that want discounts in exchange for predictable volume.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Why Multi-Model Routing Is Becoming Strategic
&lt;/h3&gt;

&lt;p&gt;Enterprise agents do not need the most expensive model for every step. A practical system may use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A strong reasoning model for planning and validation&lt;/li&gt;
&lt;li&gt;A cheaper model for classification or extraction&lt;/li&gt;
&lt;li&gt;A coding model for script generation&lt;/li&gt;
&lt;li&gt;A multimodal model for images, charts, or document screenshots&lt;/li&gt;
&lt;li&gt;A retrieval-optimized layer for fresh external information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a model such as DeepSeek becomes relevant. If it provides competitive reasoning or coding performance at lower cost, it can become attractive for high-volume agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Practical Demo: Building a Python Agent Task Cost Estimator
&lt;/h2&gt;

&lt;p&gt;The following example implements a simple estimator for agentic task cost. It uses an LLM call to classify a task and then calculates estimated credits from context size, retrieval steps, tool calls, and runtime.&lt;/p&gt;

&lt;p&gt;For the API example, we use Xuedingmao AI at &lt;code&gt;xuedingmao.com&lt;/code&gt;, model &lt;code&gt;claude-opus-4-8&lt;/code&gt;. This model is suitable for complex reasoning, long-context processing, code generation, and debugging scenarios.&lt;/p&gt;

&lt;p&gt;Before running the script, set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please set the XUEDINGMAO_API_KEY environment variable.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_agent_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Build a structured prompt for enterprise agent task classification.
&lt;/span&gt;    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are an enterprise AI agent architect.
Classify the following task into a JSON object with these fields:
task_type, complexity, expected_context_kb, retrieval_steps, tool_calls, runtime_minutes.

Task:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Return JSON only.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Prepare request headers for the /v1/messages API.
&lt;/span&gt;    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Prepare the model request body.
&lt;/span&gt;    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Send the request to the model provider.
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Raise an error if the API returns a failed status code.
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse the response body.
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract text from a common messages-style response format.
&lt;/span&gt;    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert the JSON text returned by the model into a Python dictionary.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_copilot_credits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Assign a base credit cost according to task complexity.
&lt;/span&gt;    &lt;span class="n"&gt;complexity_weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
    &lt;span class="p"&gt;}.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complexity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Estimate context cost from expected context size.
&lt;/span&gt;    &lt;span class="n"&gt;context_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected_context_kb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.02&lt;/span&gt;

    &lt;span class="c1"&gt;# Estimate retrieval cost from search or knowledge-base lookup steps.
&lt;/span&gt;    &lt;span class="n"&gt;retrieval_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;

    &lt;span class="c1"&gt;# Estimate tool cost from spreadsheet, file, browser, or database actions.
&lt;/span&gt;    &lt;span class="n"&gt;tool_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;

    &lt;span class="c1"&gt;# Estimate runtime cost from cloud execution duration.
&lt;/span&gt;    &lt;span class="n"&gt;runtime_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;runtime_minutes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;

    &lt;span class="c1"&gt;# Sum all estimated credit components.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;complexity_weight&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;context_cost&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;retrieval_cost&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tool_cost&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;runtime_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Compare 3,800 product configuration files across two releases,
identify breaking changes, generate a ranked risk report,
and create a dependency flow summary for the engineering team.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;task_profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;classify_agent_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;estimated_credits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_copilot_credits&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_profile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;estimated_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;estimated_credits&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Task profile:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Estimated Copilot credits: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;estimated_credits&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Estimated cost at $0.01 per credit: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;estimated_usd&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example is intentionally simple, but it reflects a real engineering concern: agent tasks must be observable, measurable, and budget-aware. In production, the estimator should be connected to actual logs, model call counts, retrieval traces, and tool execution metrics.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Tool and Technology Selection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 Microsoft-Side Components
&lt;/h3&gt;

&lt;p&gt;A complete enterprise agent platform usually needs more than an LLM. Microsoft’s strategy appears to combine several layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copilot Cowork: long-running cloud agent execution&lt;/li&gt;
&lt;li&gt;Work IQ: enterprise context and Microsoft 365 data grounding&lt;/li&gt;
&lt;li&gt;Web IQ: Bing-powered fresh web grounding for agents&lt;/li&gt;
&lt;li&gt;Microsoft 365 security: identity, permissions, compliance, and governance&lt;/li&gt;
&lt;li&gt;Admin controls: budget limits, user access, audit logs, and spending visibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web IQ is especially important because agent search differs from human search. Humans expect links, snippets, rankings, images, and ads. Agents need concise, fresh, machine-readable information with low latency and minimal token waste.&lt;/p&gt;

&lt;p&gt;Microsoft claims Web IQ is re-architected from indexing to ranking for agent workflows and can return fresh data across pages, news, images, and videos. The practical value is strongest when an agent needs repeated search calls during complex tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Development Platform Selection
&lt;/h3&gt;

&lt;p&gt;For independent testing or custom AI application development, a unified model access layer is useful. Xuedingmao AI (&lt;code&gt;xuedingmao.com&lt;/code&gt;) can be used as a technical development platform because it aggregates 500+ mainstream models, including GPT-5.5, Claude 4.8, and Gemini 3.1 Pro.&lt;/p&gt;

&lt;p&gt;From an engineering perspective, the main value is interface consistency. A unified OpenAI-compatible access pattern reduces the integration cost of switching between models, benchmarking latency, testing reasoning quality, and validating production prompts. Stable API behavior and fast response time are also important for batch testing, agent prototyping, and multi-model routing experiments.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Key Considerations and Common Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Cost Explosion
&lt;/h3&gt;

&lt;p&gt;Agent workflows can become expensive when task decomposition is uncontrolled. Developers should track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of model calls per task&lt;/li&gt;
&lt;li&gt;Average input and output token size&lt;/li&gt;
&lt;li&gt;Retrieval frequency&lt;/li&gt;
&lt;li&gt;Tool execution count&lt;/li&gt;
&lt;li&gt;Retry and self-correction loops&lt;/li&gt;
&lt;li&gt;Runtime duration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A practical optimization is to use smaller models for low-risk subtasks and reserve frontier models for planning, reasoning, and final validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Security and Data Boundary Control
&lt;/h3&gt;

&lt;p&gt;Enterprise agents often access sensitive company data. Before enabling autonomous workflows, teams should define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User permission inheritance&lt;/li&gt;
&lt;li&gt;File access boundaries&lt;/li&gt;
&lt;li&gt;Audit log retention&lt;/li&gt;
&lt;li&gt;Data loss prevention rules&lt;/li&gt;
&lt;li&gt;Tool execution approval policies&lt;/li&gt;
&lt;li&gt;External web access restrictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent should not gain broader permissions than the user or service account that operates it.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Model Output and Model Improvement Boundaries
&lt;/h3&gt;

&lt;p&gt;When enterprises use external or third-party models, governance must clarify how outputs, logs, prompts, and synthetic data may be used. The boundary between normal product use and model improvement can become blurred, especially when model outputs are reused for coding, evaluation, customer service, internal tools, or research.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Search Is Not Always the Bottleneck
&lt;/h3&gt;

&lt;p&gt;Web grounding latency matters, but many agent workflows spend more time on LLM inference, tool orchestration, memory handling, reasoning, and output generation. Developers should profile the full workflow rather than optimizing only search calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Summary
&lt;/h2&gt;

&lt;p&gt;Copilot Cowork represents a major shift in enterprise AI: from chat assistance to cloud-executed agent workflows. Its usage-based billing model reflects the economic reality of agentic AI, where valuable tasks may involve many model calls, retrieval steps, tool executions, and long runtimes.&lt;/p&gt;

&lt;p&gt;The reported DeepSeek integration is significant because it points toward a practical multi-model future. Enterprises will not choose models only by brand; they will compare reasoning quality, latency, cost, compliance, availability, and integration fit.&lt;/p&gt;

&lt;p&gt;Web IQ further shows that Microsoft wants to control the full agent stack: models, search, enterprise memory, tools, billing, security, and cloud runtime. For developers, the lesson is clear: successful AI agents require more than prompt engineering. They need architecture, observability, cost control, security design, and model routing strategy.&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #LargeLanguageModels #Python #MachineLearning #AI_Agents #MicrosoftCopilot #TechnicalPractice
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>[Technical Guide] Z-Code and GLM 5.2: Practical Workflow for AI Coding Agents</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Thu, 18 Jun 2026 13:35:27 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/technical-guide-z-code-and-glm-52-practical-workflow-for-ai-coding-agents-2p2o</link>
      <guid>https://dev.to/schrodingcatai/technical-guide-z-code-and-glm-52-practical-workflow-for-ai-coding-agents-2p2o</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;Z-Code is an AI coding agent built around GLM 5.2, offering generous token limits, project generation, preview, debugging, skills, MCP integration, and remote task triggering. This article explains its core mechanism, practical workflow, benchmark meaning, tool selection, and a Python API example for integrating large-model coding assistance into real development scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Background: Why AI Coding Agents Matter
&lt;/h2&gt;

&lt;p&gt;AI coding tools are moving from simple code completion toward autonomous engineering agents. Developers no longer only ask for a function or a regex; they expect an agent to understand a project, modify multiple files, run checks, inspect errors, and iterate across a longer task chain.&lt;/p&gt;

&lt;p&gt;This is where Z-Code becomes interesting. Z.ai recently released GLM 5.2, and alongside it introduced Z-Code, a coding-agent product positioned similarly to OpenAI Codex-style workflows, but optimized for the GLM model family.&lt;/p&gt;

&lt;p&gt;The practical value is clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers can create or modify projects through natural language.&lt;/li&gt;
&lt;li&gt;The agent can generate frontend previews and iterate on selected UI elements.&lt;/li&gt;
&lt;li&gt;Skills, plugins, MCP servers, and command integrations extend its working environment.&lt;/li&gt;
&lt;li&gt;The free tier reportedly provides a large daily token allowance, making it attractive for daily experimentation.&lt;/li&gt;
&lt;li&gt;GLM 5.2 shows strong benchmark performance in coding, tool use, and long-horizon engineering tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For individual developers, this lowers the cost of prototyping. For teams, it creates a new way to evaluate model-driven development workflows before adopting them in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Core Principles: How Z-Code Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Agent-Oriented Coding Instead of Single-Turn Generation
&lt;/h3&gt;

&lt;p&gt;Traditional LLM coding often follows a single request-response pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User prompt -&amp;gt; Model output -&amp;gt; Developer manually applies code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A coding agent adds orchestration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task prompt -&amp;gt; Project context -&amp;gt; File changes -&amp;gt; Preview/debug -&amp;gt; Iteration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Z-Code follows this second pattern. After creating a project and submitting a prompt, the system begins working on the task, generates code, and exposes preview and editing controls.&lt;/p&gt;

&lt;p&gt;This means the model is not only producing snippets. It is operating as a task executor with awareness of project structure, UI output, and user feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 GLM 5.2 as the Model Foundation
&lt;/h3&gt;

&lt;p&gt;GLM 5.2 is notable because it is competitive across multiple engineering benchmarks. Based on the launch material, the model is only a few points behind top closed-source models in some coding evaluations, while improving sharply over GLM 5.1.&lt;/p&gt;

&lt;p&gt;Examples mentioned include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-bench Pro: GLM 5.2 reaches 62.1, compared with 58.4 for GLM 5.1.&lt;/li&gt;
&lt;li&gt;Frontier SWA: GLM 5.2 scores 74, close to Opus at 75 and above GPT at 72.&lt;/li&gt;
&lt;li&gt;Post-Train Bench: GLM 5.2 scores 34.3, ahead of GPT at 28.4 and behind Opus at 37.2.&lt;/li&gt;
&lt;li&gt;SWE-Marathon: GLM 5.2 scores 13, above GLM 5.1 at 1 and GPT at 12, though still behind Opus at 26.&lt;/li&gt;
&lt;li&gt;MCP Atlas: GLM 5.2 scores 76.8, close to Opus at 77.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important point is not that GLM 5.2 wins every benchmark. It does not. The important point is that it performs strongly across coding, long-context reasoning, terminal-like tasks, and tool usage. These are exactly the capabilities required by modern coding agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Long Context and Long-Horizon Execution
&lt;/h3&gt;

&lt;p&gt;The strongest results come from long-horizon benchmarks, where tasks can last hours and involve complex engineering work such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building compilers&lt;/li&gt;
&lt;li&gt;Optimizing kernels&lt;/li&gt;
&lt;li&gt;Implementing production-grade services&lt;/li&gt;
&lt;li&gt;Managing machine learning experiments&lt;/li&gt;
&lt;li&gt;Improving smaller models through post-training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tests used large context windows, including up to one million tokens in some long-horizon settings. This matters because real projects are not isolated snippets. They involve requirements, existing files, logs, dependencies, tests, and incremental decisions.&lt;/p&gt;

&lt;p&gt;A model that can maintain context over long workflows is more useful than one that only writes isolated functions well.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Practical Demonstration: Using an AI Coding Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Basic Z-Code Workflow
&lt;/h3&gt;

&lt;p&gt;A typical Z-Code workflow can be summarized as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new task or project.&lt;/li&gt;
&lt;li&gt;Enter a natural-language development prompt.&lt;/li&gt;
&lt;li&gt;Let the agent generate or modify the project.&lt;/li&gt;
&lt;li&gt;Open the preview panel to inspect the result.&lt;/li&gt;
&lt;li&gt;Select a UI element from preview and ask for targeted changes.&lt;/li&gt;
&lt;li&gt;Use developer tools to inspect console logs.&lt;/li&gt;
&lt;li&gt;Continue iteration until the output is acceptable.&lt;/li&gt;
&lt;li&gt;Open the project in a local editor for manual review, Git operations, and final cleanup.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A practical prompt might be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a responsive task dashboard with a sidebar, project list, task filters,
status counters, and a compact analytics section. Use clean component structure
and keep the layout suitable for daily operations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After generation, Z-Code can show the preview in the right-side panel. If a chart, button, or table section needs adjustment, the user can refer to that specific preview element from the chat box and request a change.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Python Example: Calling a Large Model for Code Review
&lt;/h3&gt;

&lt;p&gt;In many production workflows, developers still need API-level access to automate review, testing, or documentation. The following example uses Xuedingmao AI at &lt;code&gt;xuedingmao.com&lt;/code&gt;, with the &lt;code&gt;claude-opus-4-8&lt;/code&gt; model. This model is suitable for complex reasoning, long-text analysis, code generation, and error correction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/v1/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_model_for_code_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Send source code to a large model and request a concise engineering review.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please set the XUEDINGMAO_API_KEY environment variable.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review the following Python code. Focus on correctness, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security risks, maintainability, and testability. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return practical suggestions only.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;demo_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
def divide(a, b):
    return a / b

print(divide(10, 0))
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;review_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_model_for_code_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;demo_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;review_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before running the script, configure the API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key_here"&lt;/span&gt;
python review_code.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern can be extended to support automated pull request review, documentation generation, unit test creation, and error log analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.3 API Workflow Extension
&lt;/h3&gt;

&lt;p&gt;A simple automation pipeline can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read changed files -&amp;gt; Build review prompt -&amp;gt; Call model API -&amp;gt; Parse response -&amp;gt; Save review report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, a team can connect this script to CI and automatically produce model-assisted review notes after each commit. The model should not replace human review, but it can quickly surface missing edge cases, unclear naming, weak tests, and risky assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Tool and Technical Resource Selection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 When to Use Z-Code
&lt;/h3&gt;

&lt;p&gt;Z-Code is suitable when the task is project-oriented and visual iteration matters. Typical scenarios include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapid frontend prototyping&lt;/li&gt;
&lt;li&gt;Generating small tools or internal dashboards&lt;/li&gt;
&lt;li&gt;Iterating on UI elements through preview&lt;/li&gt;
&lt;li&gt;Exploring GLM 5.2 coding capability&lt;/li&gt;
&lt;li&gt;Testing agent-style workflows with large token limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interface includes task creation, search, skills, MCP server configuration, plugins, commands, quota visibility, preview, and developer tools. These features make it closer to a lightweight cloud coding agent than a plain chatbot.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 When to Use Direct API Integration
&lt;/h3&gt;

&lt;p&gt;API integration is better when the workflow needs to be embedded into existing systems, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CI/CD review automation&lt;/li&gt;
&lt;li&gt;Codebase documentation&lt;/li&gt;
&lt;li&gt;Batch refactoring suggestions&lt;/li&gt;
&lt;li&gt;Test case generation&lt;/li&gt;
&lt;li&gt;Internal developer tools&lt;/li&gt;
&lt;li&gt;Multi-model comparison&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this type of work, Xuedingmao AI can be used as a unified model access layer. From a technical selection perspective, it is useful because it aggregates many mainstream models, including GPT-5.5, Claude 4.8, Gemini 3.1 Pro, and other frontier models. It also provides an OpenAI-compatible style interface, which reduces the adaptation cost when switching between models.&lt;/p&gt;

&lt;p&gt;For production testing, interface stability and response speed are important. A unified endpoint helps developers evaluate multiple models without rewriting integration logic for each vendor.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Notes and Common Pitfalls
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Z-Code Still Has Missing Engineering Features
&lt;/h3&gt;

&lt;p&gt;Z-Code is promising, but it is not yet perfect. Several limitations are worth noting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The file diff view appears limited and is not presented as a complete change log.&lt;/li&gt;
&lt;li&gt;There is no full built-in file explorer in the current workflow.&lt;/li&gt;
&lt;li&gt;Worktree management is missing.&lt;/li&gt;
&lt;li&gt;One-click Git initialization is not available.&lt;/li&gt;
&lt;li&gt;The built-in browser preview is useful, but it is not fully agent-controlled in the same way as some competing tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because of these gaps, developers should still open generated projects in a local editor before final delivery. Git diff, test execution, linting, dependency inspection, and security checks remain essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Benchmark Scores Need Context
&lt;/h3&gt;

&lt;p&gt;GLM 5.2 performs strongly, but benchmark results depend heavily on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context window size&lt;/li&gt;
&lt;li&gt;Agent harness design&lt;/li&gt;
&lt;li&gt;Tool access&lt;/li&gt;
&lt;li&gt;Prompt strategy&lt;/li&gt;
&lt;li&gt;Output token limit&lt;/li&gt;
&lt;li&gt;Evaluation environment&lt;/li&gt;
&lt;li&gt;Task sampling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, some long-horizon tests use full one-million-token context and high-effort settings. These are expensive evaluations and may not reflect default consumer settings. Therefore, benchmark scores should guide evaluation, not replace hands-on testing.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Practical Prompting Tips
&lt;/h3&gt;

&lt;p&gt;For better coding-agent results, prompts should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Target framework or language&lt;/li&gt;
&lt;li&gt;Expected file structure&lt;/li&gt;
&lt;li&gt;UI or API behavior&lt;/li&gt;
&lt;li&gt;Constraints and forbidden approaches&lt;/li&gt;
&lt;li&gt;Test requirements&lt;/li&gt;
&lt;li&gt;Performance or compatibility requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A weak prompt is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a dashboard.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A stronger prompt is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a React task dashboard for internal project tracking.
Include a sidebar, task table, status filters, priority badges, and responsive layout.
Use reusable components and keep the design compact for daily operations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second prompt gives the agent enough structure to make better decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Always Verify Generated Code
&lt;/h3&gt;

&lt;p&gt;AI-generated code should be treated as a draft. Before using it in production, developers should verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runtime behavior&lt;/li&gt;
&lt;li&gt;Dependency versions&lt;/li&gt;
&lt;li&gt;Security-sensitive logic&lt;/li&gt;
&lt;li&gt;Error handling&lt;/li&gt;
&lt;li&gt;Edge cases&lt;/li&gt;
&lt;li&gt;Accessibility&lt;/li&gt;
&lt;li&gt;Test coverage&lt;/li&gt;
&lt;li&gt;License compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For frontend projects, preview inspection is not enough. Console logs, network requests, responsive layout, and keyboard interaction should also be checked.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Conclusion
&lt;/h2&gt;

&lt;p&gt;Z-Code is a practical AI coding-agent product built around GLM 5.2. Its main advantages are generous usage limits, project-level generation, preview-based iteration, skills, MCP-related configuration, remote task triggering, and strong alignment with GLM’s coding capability.&lt;/p&gt;

&lt;p&gt;GLM 5.2 is especially notable because it shows meaningful progress in coding benchmarks, long-horizon engineering tasks, and tool-use scenarios. It does not dominate every chart, and tools such as Opus still lead in some complex engineering evaluations. However, GLM 5.2 has reached a level where it deserves serious testing by developers building AI-assisted coding workflows.&lt;/p&gt;

&lt;p&gt;For daily use, the best approach is pragmatic: use Z-Code for fast project iteration and visual feedback, then use local editors, Git, tests, and API-based review tools for engineering control. Combined with unified model access platforms such as Xuedingmao AI, developers can build a flexible workflow that supports experimentation, automation, and production-grade validation.&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #LargeLanguageModel #Python #MachineLearning #CodingAgent #GLM #ZCode #TechnicalPractice
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>【Technical Deep Dive】NVIDIA NIM Free API + Open Code: A Practical Guide to MiniMax M3, Step-3.7-Flash, and Nemotron-3-Ultra</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Mon, 15 Jun 2026 14:27:35 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/technical-deep-dive-nvidia-nim-free-api-open-code-a-practical-guide-to-minimax-m3-106c</link>
      <guid>https://dev.to/schrodingcatai/technical-deep-dive-nvidia-nim-free-api-open-code-a-practical-guide-to-minimax-m3-106c</guid>
      <description>&lt;h2&gt;
  
  
  1. Background: Why NVIDIA NIM Deserves More Attention
&lt;/h2&gt;

&lt;p&gt;Most developers looking for free LLM API access default to OpenRouter or Groq. NVIDIA Build (build.nvidia.com/models) is frequently overlooked, yet it quietly hosts one of the most developer-friendly model catalogs available today.&lt;/p&gt;

&lt;p&gt;The core offering is NIM — NVIDIA Inference Microservices. The concept is straightforward: NVIDIA takes open-weight and partnered models, optimizes them for its GPU infrastructure using TensorRT and quantization techniques, and exposes them through stable API endpoints. Developers interact with these endpoints using a familiar OpenAI-compatible interface.&lt;/p&gt;

&lt;p&gt;At the time of writing, the catalog lists 139 models, 77 of which provide free endpoints for development and testing. The rate limits are real, and free tiers are not intended for production traffic — but for experiments, prototyping, and integrating into AI coding tools, this is a genuinely useful resource that deserves broader adoption.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Models: Capabilities and Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 MiniMax M3 — Multimodal, Long-Context Creative Coding
&lt;/h3&gt;

&lt;p&gt;MiniMax M3 Preview is a multimodal mixture-of-experts (MoE) vision-language model. Its key differentiator is that it is not purely a text model. It accepts text, images, and video as input and produces text output.&lt;/p&gt;

&lt;p&gt;Key specifications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total parameters:&lt;/strong&gt; 456B (MoE architecture)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active parameters:&lt;/strong&gt; 22B per forward pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context length:&lt;/strong&gt; 512K tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inputs:&lt;/strong&gt; Text, image, video (up to ~30 minutes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;NVIDIA's model page describes its strengths as long-context reasoning, agentic workflows, creative tasks, long-form video understanding, and extended coding sessions. The 512K context window is particularly relevant for large codebase work where you need the model to hold significant state.&lt;/p&gt;

&lt;p&gt;Practical use case in coding: feed it a UI screenshot, ask it to reason about the layout, suggest improvements, and then use an agent to implement those changes. This kind of vision-to-code pipeline is where MiniMax M3 stands apart from standard text-only coding models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;License note:&lt;/strong&gt; The model page marks it as non-commercial. It is suitable for personal projects, testing, and research, but verify the license terms before any commercial deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Step-3.7-Flash — Fast, Practical General Coding
&lt;/h3&gt;

&lt;p&gt;Step-3.7-Flash is positioned as a high-speed reasoning model for general coding tasks. When you need quick turnaround on bug fixes, test generation, or standard feature implementation, this is the model to reach for first.&lt;/p&gt;

&lt;p&gt;The "Flash" designation indicates it is optimized for low latency over maximum capability, similar to the design philosophy behind Gemini Flash or Claude Haiku. For the majority of day-to-day coding tasks in an AI-assisted workflow, raw benchmark performance matters less than response speed and instruction-following reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Nemotron-3-Ultra — Deep Reasoning and Long-Context Planning
&lt;/h3&gt;

&lt;p&gt;Nemotron-3-Ultra (nvidia/nemotron-3-ultra-253b-v1) is NVIDIA's own model, built on the Llama architecture and fine-tuned for complex reasoning tasks. This is the model to use when you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture planning across large codebases&lt;/li&gt;
&lt;li&gt;Multi-step reasoning on ambiguous requirements&lt;/li&gt;
&lt;li&gt;Difficult debugging that requires tracing logic across many files&lt;/li&gt;
&lt;li&gt;Thorough code review with detailed explanations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is heavier and slower than Step-3.7-Flash, but when the task genuinely requires deep reasoning, the quality difference is noticeable.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Integration: Connecting NVIDIA NIM to Your Coding Tool
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Getting an API Key
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to &lt;a href="https://build.nvidia.com" rel="noopener noreferrer"&gt;build.nvidia.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Create an account or sign in&lt;/li&gt;
&lt;li&gt;Go to the model page for any model you want to use&lt;/li&gt;
&lt;li&gt;Generate an API key from the dashboard&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  3.2 OpenAI-Compatible Integration
&lt;/h3&gt;

&lt;p&gt;NVIDIA NIM exposes an OpenAI-compatible endpoint, which means any tool that supports custom OpenAI providers will work without modification. The base URL is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://integrate.api.nvidia.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tools like Continue, Cline, Kiro, or any custom script using the OpenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Configure the client to point at NVIDIA NIM
# instead of api.openai.com
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://integrate.api.nvidia.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NVIDIA_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# your NIM API key
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Model IDs must be copied exactly from the NVIDIA Build model page
# Do not guess or abbreviate them
&lt;/span&gt;&lt;span class="n"&gt;MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast_coding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stepfun-ai/step-3.7-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multimodal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimax/minimax-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nvidia/nemotron-3-ultra-253b-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_nim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast_coding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Call an NVIDIA NIM model using the OpenAI-compatible API.

    Args:
        prompt:    The user prompt to send to the model.
        model_key: Key from the MODELS dict above.
                   &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast_coding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    -&amp;gt; Step-3.7-Flash  (low latency)
                   &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multimodal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;     -&amp;gt; MiniMax M3       (vision + long ctx)
                   &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; Nemotron-3-Ultra (complex tasks)

    Returns:
        The model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s text response as a string.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MODELS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert software engineer. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provide concise, correct, and well-commented code.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# lower temperature for deterministic code output
&lt;/span&gt;        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# adjust based on expected response length
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;


&lt;span class="c1"&gt;# --- Example usage ---
&lt;/span&gt;
&lt;span class="c1"&gt;# Quick bug fix: use the fast model
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_nim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fix the off-by-one error in this Python list slice: data[1:n+1]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fast_coding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Step-3.7-Flash response:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Frontend design task: use the multimodal model
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_nim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I have a React dashboard with a sidebar nav and a data table. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Suggest layout improvements for mobile responsiveness.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multimodal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;MiniMax M3 response:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Architecture planning: use the deep reasoning model
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_nim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Design a microservice architecture for a real-time notification system &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;that must handle 100k concurrent users with at-most-once delivery guarantees.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep_reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Nemotron-3-Ultra response:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 Using the OpenAI SDK as an Alternative
&lt;/h3&gt;

&lt;p&gt;If you prefer using a dedicated Anthropic-style client or need structured output features, the same endpoint pattern works. Below is a minimal example demonstrating a code review workflow with claude-opus-4-8 via a unified aggregation platform, which is covered in the next section.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Xuedingmao (xuedingmao.com) aggregates 500+ models including
# Claude 4.8, GPT-5.5, and Gemini 3.1 Pro under a single
# OpenAI-compatible interface. claude-opus-4-8 performs well on
# complex reasoning, long-text processing, and code generation.
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# unified model gateway
&lt;/span&gt;    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XDM_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_snippet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Use claude-opus-4-8 to perform a structured code review.

    Args:
        code_snippet: The source code string to review.

    Returns:
        A structured review with issues, suggestions, and a corrected version.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# strong reasoning, ideal for code review
&lt;/span&gt;        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review the following code for bugs, security issues, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and style problems. Provide:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1. A list of issues found&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2. Specific suggestions for each issue&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3. A corrected version of the code&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code_snippet&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;


&lt;span class="c1"&gt;# Example: review a function with a SQL injection vulnerability
&lt;/span&gt;&lt;span class="n"&gt;sample_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
def get_user(username):
    query = f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE name = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{username}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="s"&gt;
    return db.execute(query)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Developer Tooling and Platform Selection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 NVIDIA Build — Direct Access
&lt;/h3&gt;

&lt;p&gt;For developers using Open Code, NVIDIA NIM is available as a built-in provider. You select it from the provider dropdown, paste your API key, and choose a model. No manual configuration required.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.2 Xuedingmao AI — Unified Multi-Model Gateway
&lt;/h3&gt;

&lt;p&gt;When working across multiple tools and models simultaneously, managing separate API keys and base URLs for each provider adds friction. &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;Xuedingmao AI (xuedingmao.com)&lt;/a&gt; addresses this by aggregating 500+ models — including GPT-5.5, Claude 4.8, Gemini 3.1 Pro, and new releases — behind a single OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;From a purely technical standpoint, the value is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single base URL and API key across all integrated models&lt;/li&gt;
&lt;li&gt;New model releases (including frontier models) available at launch&lt;/li&gt;
&lt;li&gt;High endpoint stability, which matters for automated pipelines&lt;/li&gt;
&lt;li&gt;No per-provider interface differences to handle in code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For production AI development workflows where you are routing requests across multiple models based on task type, a unified gateway simplifies the integration layer considerably.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 Self-Hosted NIM
&lt;/h3&gt;

&lt;p&gt;For teams with GPU infrastructure, NVIDIA NIM also ships as Docker containers deployable on-premises. The same model IDs and API interface apply — you simply point your base URL at your local endpoint instead of NVIDIA's cloud. This path is relevant for enterprise deployments with data residency requirements or high-volume workloads where serverless rate limits are a constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Practical Workflow and Known Limitations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Recommended Model Routing Strategy
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Recommended Model&lt;/th&gt;
&lt;th&gt;Rationale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quick bug fixes, unit tests&lt;/td&gt;
&lt;td&gt;Step-3.7-Flash&lt;/td&gt;
&lt;td&gt;Low latency, solid instruction following&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI work, screenshots, design feedback&lt;/td&gt;
&lt;td&gt;MiniMax M3&lt;/td&gt;
&lt;td&gt;Vision input, 512K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture, complex reasoning&lt;/td&gt;
&lt;td&gt;Nemotron-3-Ultra&lt;/td&gt;
&lt;td&gt;Deep reasoning, thorough output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A practical setup: keep a paid model (Claude or GPT) for critical production tasks, and use NVIDIA NIM free endpoints as the default for experiments, prototyping, and iterative development.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Common Pitfalls
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Model ID accuracy:&lt;/strong&gt; Always copy model IDs directly from the NVIDIA Build model page. The IDs include exact version hashes. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;nvidia/nemotron-3-ultra-253b-v1&lt;/span&gt;
&lt;span class="s"&gt;minimax/minimax-01&lt;/span&gt;
&lt;span class="s"&gt;stepfun-ai/step-3.7-flash&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Guessing or abbreviating will result in a 404 or model-not-found error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limits on free tiers:&lt;/strong&gt; Free endpoints are throttled. For development workflows with frequent calls, implement exponential backoff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retry wrapper with exponential backoff for rate limit handling.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;call_nim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;429&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;  &lt;span class="c1"&gt;# 1s, 2s, 4s
&lt;/span&gt;                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limited. Retrying in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;raise&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;License compliance:&lt;/strong&gt; MiniMax M3 is marked non-commercial on the NVIDIA Build page. Verify licensing for any model before using it in a commercial product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark scores vs. practical usability:&lt;/strong&gt; For AI coding workflows, raw benchmark performance is not the primary metric. A model that reliably follows tool-call schemas, avoids unnecessary file modifications, and produces clean diffs is often more valuable in practice than a higher-ranked model that is verbose or unpredictable. Test each model on your actual tasks before committing to it.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Summary
&lt;/h2&gt;

&lt;p&gt;NVIDIA NIM provides a legitimate path to running frontier-scale models through free development endpoints. The combination of MiniMax M3 (multimodal, 512K context), Step-3.7-Flash (fast general coding), and Nemotron-3-Ultra (deep reasoning) covers the most common AI coding use cases. Because NIM exposes an OpenAI-compatible interface, integration requires nothing more than swapping a base URL and API key in any existing setup.&lt;/p&gt;

&lt;p&gt;The free tier has real rate limits and is not intended for production traffic, but as a development and prototyping resource, it is one of the more generous options currently available. Pair it with a unified platform like Xuedingmao AI for multi-model workflows, and the practical overhead of working across multiple providers drops significantly.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #AI #LLM #Python #MachineLearning #TechnicalPractice #NVIDIA #NIM #OpenAI-Compatible #AICodeAssistant&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>【Deep Analysis】OpenRouter Fusion API: Multi-Model Compound Intelligence or Misleading Marketing?</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Sun, 14 Jun 2026 14:46:57 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/deep-analysis-openrouter-fusion-api-multi-model-compound-intelligence-or-misleading-marketing-2e40</link>
      <guid>https://dev.to/schrodingcatai/deep-analysis-openrouter-fusion-api-multi-model-compound-intelligence-or-misleading-marketing-2e40</guid>
      <description>&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; OpenRouter recently launched its Fusion API, claiming it achieves "Fable-level intelligence at half the price" through parallel multi-model dispatching and a judge-model synthesis mechanism. This article dissects how Fusion works under the hood, examines the benchmark methodology behind the marketing claims, presents hands-on test results across multiple task types, and provides a practical multi-model aggregation code example using the Claude Opus 4.8 API — helping developers make a clear-eyed judgment before integrating Fusion into production workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Background: The Rise of Compound Model APIs
&lt;/h2&gt;

&lt;p&gt;The competitive landscape of large language models has shifted beyond raw model capability. Increasingly, API platforms are experimenting with &lt;strong&gt;compound inference systems&lt;/strong&gt; — architectures that route a single prompt through multiple models, synthesize their outputs, and return a unified answer. The motivation is straightforward: no single model dominates every task category, and ensemble methods have long demonstrated superiority over single-model approaches in classical machine learning.&lt;/p&gt;

&lt;p&gt;OpenRouter, best known as a &lt;strong&gt;model routing aggregation platform&lt;/strong&gt;, entered this space with its Fusion API. The headline claim is bold: Fusion delivers Fable 5-level intelligence at half the cost, evidenced by benchmarks showing fusion combinations of Opus 4.8, Gemini 3.1 Pro, and GPT-5.5 outscoring standalone models on deep research tasks.&lt;/p&gt;

&lt;p&gt;Understanding whether this claim holds up — and where it breaks down — is critical for any developer considering Fusion for production use.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Architecture: How Fusion Actually Works
&lt;/h2&gt;

&lt;p&gt;OpenRouter's official description of Fusion follows a three-stage pipeline:&lt;/p&gt;

&lt;h3&gt;
  
  
  2.1 Parallel Panel Dispatch
&lt;/h3&gt;

&lt;p&gt;When a prompt is submitted to the Fusion endpoint, it is simultaneously dispatched to a &lt;strong&gt;panel of heterogeneous models&lt;/strong&gt;, each with web search and web fetch capabilities enabled. This parallel execution is key to the latency tradeoff — running N models in parallel adds minimal wall-clock time compared to sequential calls, but multiplies token cost proportionally.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Judge Model Analysis
&lt;/h3&gt;

&lt;p&gt;A dedicated &lt;strong&gt;judge model&lt;/strong&gt; reads all panel responses and produces a structured meta-analysis covering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consensus points&lt;/strong&gt; — claims agreed upon across multiple models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contradictions&lt;/strong&gt; — conflicting assertions requiring resolution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial coverage&lt;/strong&gt; — areas addressed by some but not all models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique insights&lt;/strong&gt; — high-value information from a single model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind spots&lt;/strong&gt; — topics absent from all panel responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This structured decomposition is conceptually sound. It mirrors academic peer-review workflows and is not entirely novel — similar judge-model patterns appear in Constitutional AI, LLM-as-evaluator research, and multi-agent debate frameworks.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Synthesis and Final Response
&lt;/h3&gt;

&lt;p&gt;The calling model receives the judge's structured analysis and produces the final answer grounded in that synthesis rather than any single raw response. The system exposes a standard OpenAI-compatible API interface, meaning integration requires no special SDK — a genuine usability advantage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture summary:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Prompt
    │
    ▼
┌─────────────────────────────┐
│   Parallel Panel Dispatch   │
│  Model A │ Model B │ Model C │  (each with web search + fetch)
└─────────────────────────────┘
    │           │           │
    ▼           ▼           ▼
         Judge Model
    (Consensus / Contradictions /
     Unique Insights / Blind Spots)
              │
              ▼
       Synthesis Model
              │
              ▼
       Final Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Benchmark Methodology: Where the Marketing Falters
&lt;/h2&gt;

&lt;p&gt;The benchmark cited in OpenRouter's Fusion announcement is &lt;strong&gt;Draco Bench&lt;/strong&gt;, developed by Perplexity specifically for deep research tasks. Results on Draco Bench show fusion combinations scoring progressively higher as more models are added to the panel.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 The Benchmark Selection Problem
&lt;/h3&gt;

&lt;p&gt;The core methodological issue is &lt;strong&gt;task-scope overgeneralization&lt;/strong&gt;: demonstrating superiority on a single deep-research benchmark and claiming general intelligence superiority is a significant logical leap. Draco Bench evaluates retrieval-augmented synthesis — exactly the scenario where ensemble methods with web access perform best. It does not measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw &lt;strong&gt;code generation&lt;/strong&gt; accuracy (e.g., HumanEval, SWE-bench)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mathematical reasoning&lt;/strong&gt; (MATH, AIME)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-step logical inference&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency-sensitive agentic tool use&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fable's reputation was built primarily on raw coding capability — a dimension entirely absent from the benchmark comparison. Claiming Fusion "beats Fable" without testing on coding benchmarks is analogous to claiming a marathon runner beats a sprinter based solely on endurance metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Hands-On Test Results
&lt;/h3&gt;

&lt;p&gt;Practical evaluation across several task types reveals a more nuanced picture:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Elevator physics simulator&lt;/td&gt;
&lt;td&gt;Functional but buggy&lt;/td&gt;
&lt;td&gt;No clear advantage over standalone Opus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contact lens case 3D model&lt;/td&gt;
&lt;td&gt;Acceptable&lt;/td&gt;
&lt;td&gt;Proportions off; equivalent to Opus alone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three.js folding table sim&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;td&gt;Legs overlap when folded; physically incorrect&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Panda SVG illustration&lt;/td&gt;
&lt;td&gt;Acceptable&lt;/td&gt;
&lt;td&gt;Visually similar to standalone Gemini output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bow-and-arrow game&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;td&gt;Target stacking logic broken&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Math reasoning question&lt;/td&gt;
&lt;td&gt;Failed&lt;/td&gt;
&lt;td&gt;Incorrect answer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local model trainer&lt;/td&gt;
&lt;td&gt;Could not run&lt;/td&gt;
&lt;td&gt;Agent compatibility gap&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The pattern is consistent: for &lt;strong&gt;text synthesis and research aggregation&lt;/strong&gt;, Fusion may offer marginal gains. For &lt;strong&gt;structured code generation, geometric reasoning, and mathematical computation&lt;/strong&gt;, performance is comparable to or worse than a well-chosen single model.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Practical Implementation: Multi-Model Synthesis with Claude Opus 4.8
&lt;/h2&gt;

&lt;p&gt;For developers who want to implement a &lt;strong&gt;custom compound inference pipeline&lt;/strong&gt; — achieving the conceptual benefits of Fusion with full control over model selection, cost, and latency — the following pattern using Claude claude-opus-4-8 provides a production-ready starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model introduction:&lt;/strong&gt; Claude Opus 4.8 delivers strong performance on complex logical reasoning, long-context processing, and code generation with error correction — well-suited for the synthesis role in a multi-model pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="c1"&gt;# ─── Configuration ────────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;# API gateway (aggregates 500+ models)
&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;               &lt;span class="c1"&gt;# Replace with your actual key
&lt;/span&gt;&lt;span class="n"&gt;SYNTHESIS_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;# Primary synthesis model
&lt;/span&gt;
&lt;span class="c1"&gt;# Panel models to query in parallel (customize as needed)
&lt;/span&gt;&lt;span class="n"&gt;PANEL_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-1-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ─── Initialize Anthropic client pointing to aggregation gateway ───────────────
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_panel_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Query a single panel model and return its response.

    Args:
        model:  Model identifier string (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)
        prompt: The user prompt to send

    Returns:
        dict with keys &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; (or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; on failure)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# Limit panel responses to control cost
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_judge_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;panel_responses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Construct the structured analysis prompt for the judge model.

    Args:
        prompt:          The original user prompt
        panel_responses: List of panel model responses

    Returns:
        Formatted judge prompt string
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;responses_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--- Response from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; ---&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ERROR&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; + r.get(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="n"&gt;Unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;panel_responses&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a judge model. Analyze the following panel responses to this prompt:

ORIGINAL PROMPT: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

PANEL RESPONSES:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;responses_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Produce a structured analysis with the following sections:
1. CONSENSUS POINTS: Claims agreed upon by multiple models
2. CONTRADICTIONS: Conflicting assertions requiring resolution
3. PARTIAL COVERAGE: Topics addressed by some but not all models
4. UNIQUE INSIGHTS: High-value information from a single model
5. BLIND SPOTS: Important topics absent from all responses

Be concise and factual.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;judge_analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Construct the final synthesis prompt using the judge&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s structured analysis.

    Args:
        original_prompt: The user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s original question
        judge_analysis:  The judge model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s structured analysis output

    Returns:
        Formatted synthesis prompt string
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Based on the following structured analysis of multiple model responses,
write a comprehensive, accurate final answer to the original prompt.

ORIGINAL PROMPT: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;original_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

STRUCTURED ANALYSIS:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;judge_analysis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Synthesize a final answer that incorporates consensus points, resolves contradictions,
and highlights unique insights. Be direct and technically precise.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compound_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Main compound inference pipeline: dispatch → judge → synthesize.

    Args:
        prompt:  User prompt string
        verbose: If True, print intermediate panel responses

    Returns:
        Final synthesized response string
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Dispatch to panel models in parallel
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[1/3] Dispatching to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PANEL_MODELS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; panel models in parallel...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;concurrent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PANEL_MODELS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_panel_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;PANEL_MODELS&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;panel_responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;concurrent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;panel_responses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Panel] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Judge model analysis
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[2/3] Running judge model analysis...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;judge_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_judge_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;panel_responses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;judge_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYNTHESIS_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# Use Opus 4.8 as judge for quality
&lt;/span&gt;        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;judge_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;judge_analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;judge_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Judge Analysis]:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;judge_analysis&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Final synthesis
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[3/3] Generating final synthesized response...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;synthesis_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;judge_analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;final_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYNTHESIS_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               &lt;span class="c1"&gt;# Allow longer final response
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;synthesis_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;


&lt;span class="c1"&gt;# ─── Entry Point ──────────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;test_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the attention mechanism in transformer models, and what are its computational complexity tradeoffs?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compound_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FINAL SYNTHESIZED RESPONSE:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Development Tool Selection
&lt;/h2&gt;

&lt;p&gt;For developers building multi-model pipelines, &lt;strong&gt;Xuedingmao AI (xuedingmao.com)&lt;/strong&gt; provides a technically practical aggregation layer worth evaluating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model breadth:&lt;/strong&gt; 500+ mainstream models including GPT-5.5, Claude Opus 4.8, and Gemini 3.1 Pro accessible through a single endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified interface:&lt;/strong&gt; Full OpenAI-compatible API — no per-model SDK adaptation required, which significantly reduces integration complexity when building compound pipelines like the one above&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First-access availability:&lt;/strong&gt; New model releases are typically available on the platform promptly, allowing teams to benchmark frontier models without waiting for direct API access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface stability:&lt;/strong&gt; Consistent response latency and uptime characteristics suited to production workloads and automated testing pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The unified interface matters most when implementing the parallel dispatch layer — the same &lt;code&gt;client.messages.create()&lt;/code&gt; call works regardless of which panel model is targeted, eliminating per-model authentication and format handling overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Key Considerations and Practical Caveats
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 Task Suitability
&lt;/h3&gt;

&lt;p&gt;Compound inference genuinely helps for &lt;strong&gt;text synthesis, research aggregation, and knowledge consolidation&lt;/strong&gt; tasks where multiple perspectives reduce hallucination risk. It is less effective — and potentially harmful to output quality — for tasks requiring &lt;strong&gt;deterministic computation, geometric reasoning, and structured code generation&lt;/strong&gt;, where model disagreement introduces noise rather than signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.2 Latency and Cost Tradeoffs
&lt;/h3&gt;

&lt;p&gt;Each Fusion call incurs the cost of N panel model calls plus a judge call plus a synthesis call. For GPT-5.5 + Gemini 3.1 Pro + Opus 4.8 as a panel, this is a minimum of 4× the base token cost. Latency is bounded by the slowest panel model response. These tradeoffs must be evaluated against actual task requirements before committing to compound inference in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.3 Agent Compatibility
&lt;/h3&gt;

&lt;p&gt;Current agentic frameworks (LangChain, LlamaIndex, AutoGen) do not natively support Fusion as a drop-in model. Custom wrappers are required, and tool-call round-trip latency compounds with each agentic step. For latency-sensitive agentic workflows, a single high-capability model remains the pragmatic choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  6.4 Benchmark Interpretation
&lt;/h3&gt;

&lt;p&gt;Always verify benchmark task coverage before making model selection decisions. A model that tops a deep-research leaderboard may underperform on code generation, and vice versa. Diversified evaluation across task types representative of your actual workload is the only reliable methodology.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Summary
&lt;/h2&gt;

&lt;p&gt;OpenRouter Fusion introduces a conceptually sound compound inference architecture — parallel panel dispatch, structured judge analysis, and grounded synthesis. For deep research and knowledge aggregation tasks, the approach has merit. However, the marketing claim that Fusion "surpasses Fable" is unsupported: the benchmark evidence covers only one task domain, hands-on testing shows inconsistent results across coding and reasoning tasks, latency and cost are materially higher than single-model alternatives, and agent framework support is limited.&lt;/p&gt;

&lt;p&gt;The practical lesson for developers: compound model pipelines are a legitimate tool with specific use cases, not a universal capability upgrade. Implementing a custom pipeline — with full control over model selection and evaluation scope — often yields more predictable results than a black-box compound API. OpenRouter's core value proposition remains model routing and aggregation; Fusion is an interesting experiment that has not yet cleared the bar of its own claims.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#AI #大模型 #Python #机器学习 #技术实战 #LLM #API开发 #多模型融合&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>MiniMax M3 + MiniMax Code：开源大模型驱动 AI 工作流的完整实战指南</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Sat, 13 Jun 2026 14:55:47 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/minimax-m3-minimax-codekai-yuan-da-mo-xing-qu-dong-ai-gong-zuo-liu-de-wan-zheng-shi-zhan-zhi-nan-2ooh</link>
      <guid>https://dev.to/schrodingcatai/minimax-m3-minimax-codekai-yuan-da-mo-xing-qu-dong-ai-gong-zuo-liu-de-wan-zheng-shi-zhan-zhi-nan-2ooh</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;MiniMax M3 is a powerful open-source multimodal model supporting a 1M token context window, competing with top proprietary models at a fraction of the cost. This article breaks down M3's core capabilities, explains how pairing it with the MiniMax Code agentic workspace unlocks full workflow automation, and walks through practical demos — from generating polished front-end UIs to building scheduled multi-agent deep research pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Background: Why Open-Source Models Are Closing the Gap
&lt;/h2&gt;

&lt;p&gt;For years, developers building production AI workflows faced an uncomfortable tradeoff: use closed-source models with strong performance but high cost and vendor lock-in, or use open-source alternatives that lagged significantly on complex tasks. That gap has been narrowing fast, and MiniMax M3 represents one of the clearest examples of this shift.&lt;/p&gt;

&lt;p&gt;Closed-source frontier models like Claude Opus or GPT-4 dominate benchmarks, but they come with per-token costs that make agent-based workflows — where a single task can trigger hundreds of LLM calls — economically painful at scale. For developers building automated pipelines, multi-step code generation flows, or persistent background agents, cost efficiency is not a secondary concern; it directly determines what architectures are viable.&lt;/p&gt;

&lt;p&gt;MiniMax M3 enters this space with a combination of capabilities that makes it worth serious attention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1 million token context window&lt;/strong&gt; — enabling full-codebase reasoning, long document analysis, and multi-turn agent memory without chunking hacks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native multimodality&lt;/strong&gt; — text, image, audio, and video processing in a single model, without routing between specialized models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source weights&lt;/strong&gt; — deployable locally or via API, with no usage restrictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive benchmark performance&lt;/strong&gt; — outperforming Claude Opus 4.7 on several evaluated dimensions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When this model is paired with MiniMax Code, an agentic IDE workspace built specifically around M3, the combination shifts from "capable model" to "deployable AI employee."&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Architecture: What Makes M3 Different
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Model Design Principles
&lt;/h3&gt;

&lt;p&gt;MiniMax M3 is built as a natively multimodal model rather than a text model with vision adapters bolted on. This architectural choice matters because it avoids the inference overhead and capability degradation that comes with post-hoc modality fusion. The model processes cross-modal context in a unified representation space, which improves coherence when tasks involve mixed inputs — for example, analyzing a UI screenshot and generating corresponding front-end code.&lt;/p&gt;

&lt;p&gt;The 1M token context window is not just a marketing number. At this scale, a model can hold an entire mid-size codebase in context simultaneously, enabling it to reason about inter-module dependencies, track state across long agent trajectories, and avoid the retrieval errors that plague RAG-based approaches for code understanding.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 The MiniMax Code Workspace
&lt;/h3&gt;

&lt;p&gt;MiniMax Code is not a chat interface with a code highlighting plugin. It is an agentic workspace built on top of M3 that provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistent agent memory&lt;/strong&gt; — the agent remembers user preferences, project context, and prior decisions across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool use&lt;/strong&gt; — web browsing, file system access, computer control, and custom skill installation via slash commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent orchestration&lt;/strong&gt; — the ability to spawn sub-agent teams where different agents handle search, verification, coding, and reporting in parallel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background execution&lt;/strong&gt; — tasks continue running after the user closes the application, with mobile push notifications on completion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scheduled automation&lt;/strong&gt; — recurring tasks can be configured with cron-style scheduling, enabling daily automated pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stack turns M3 from a capable model into an autonomous workflow engine.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Practical Demos: What This Setup Actually Produces
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Front-End UI Generation
&lt;/h3&gt;

&lt;p&gt;In a single-shot prompt, M3 via MiniMax Code generated a complete premium product landing page for a headphone brand. The output included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dynamic CSS animations and scroll transitions&lt;/li&gt;
&lt;li&gt;Responsive layout with clean grid structure&lt;/li&gt;
&lt;li&gt;Multiple typography styles with consistent visual hierarchy&lt;/li&gt;
&lt;li&gt;Fully functional interactive elements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This level of output quality from a single prompt, with no iterative refinement, positions M3 as a competitive choice for rapid front-end prototyping. Comparable output from closed-source models costs significantly more per generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Scheduled Deep Research Agent
&lt;/h3&gt;

&lt;p&gt;A more advanced demonstration involves building a daily AI news digest pipeline using the deep research skill:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the deep research skill via the &lt;code&gt;/&lt;/code&gt; command in MiniMax Code&lt;/li&gt;
&lt;li&gt;Define a research task: find the top 5 AI news topics of the day, including new model releases, humanoid robotics, and leaked specifications&lt;/li&gt;
&lt;li&gt;Enable extended thinking mode for better source evaluation&lt;/li&gt;
&lt;li&gt;Schedule the task to run daily at 9:00 AM&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The agent autonomously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploys a team of sub-agents for parallel web search&lt;/li&gt;
&lt;li&gt;Verifies information across multiple sources&lt;/li&gt;
&lt;li&gt;Compiles results into a structured Markdown report&lt;/li&gt;
&lt;li&gt;Delivers output to a right-side panel or file system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user does not need to keep their machine running. The workspace operates as a 24/7 background service.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. API Integration: Calling Frontier Models Programmatically
&lt;/h2&gt;

&lt;p&gt;For developers who want to integrate similar reasoning capabilities into their own pipelines, the following example demonstrates how to call a high-performance model API using the &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;Xuedingmao AI platform&lt;/a&gt;. The platform aggregates 500+ mainstream large models — including GPT-5.5, Claude 4.8, and Gemini 3.1 Pro — with real-time access to newly released models, a unified OpenAI-compatible interface, and stable high-throughput endpoints suited for production agent workflows.&lt;/p&gt;

&lt;p&gt;The default model used here is &lt;code&gt;claude-opus-4-8&lt;/code&gt;, which excels at complex logical reasoning, long-context processing, and code generation — well-suited for the agentic use cases described in this article.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;  &lt;span class="c1"&gt;# pip install anthropic
&lt;/span&gt;
&lt;span class="c1"&gt;# ============================================================
# Configuration — Xuedingmao AI unified API endpoint
# Aggregates 500+ models with OpenAI-compatible interface
# BASE_URL: https://xuedingmao.com
# ============================================================
&lt;/span&gt;
&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;       &lt;span class="c1"&gt;# Replace with your actual API key
&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Unified gateway for all supported models
&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;         &lt;span class="c1"&gt;# High-capability model for complex reasoning
&lt;/span&gt;
&lt;span class="c1"&gt;# Initialize the Anthropic-compatible client
# The unified interface means you can swap MODEL_ID without changing call logic
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_deep_research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Simulate a deep research agent task.

    Args:
        topic: Research subject — e.g. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest open-source LLM releases this week&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
        max_tokens: Maximum tokens in the response (default 2048)

    Returns:
        Structured research report as a string
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# System prompt defines the agent's persona and output format
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a senior AI research analyst. 
    When given a research topic, you must:
    1. Identify the 5 most significant recent developments
    2. Provide a brief summary for each item
    3. Rank them by technical significance
    4. Output results in clean Markdown format with source citations where available

    Be precise, factual, and concise. Avoid filler content.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# User message contains the specific research instruction
&lt;/span&gt;    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Research topic: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    Please compile a structured daily digest covering the most important recent developments.
    Format the output as a numbered Markdown list with headers for each item.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# API call — using the /v1/messages endpoint
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# claude-opus-4-8: strong at long-context analysis
&lt;/span&gt;        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Adjust based on expected report length
&lt;/span&gt;        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;# Persistent agent behavior definition
&lt;/span&gt;        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;  &lt;span class="c1"&gt;# Task-specific instruction
&lt;/span&gt;            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract text content from the response object
&lt;/span&gt;    &lt;span class="c1"&gt;# response.content is a list of content blocks; [0].text gets the primary text
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;schedule_daily_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Entry point for a scheduled daily research task.
    In production, trigger this via cron, Airflow, or a task queue.

    Args:
        topic: The research domain to monitor daily
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting deep research on: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_deep_research_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Output the compiled report
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DAILY AI DIGEST — RESEARCH REPORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# In production: write to file, send via email, or push to a dashboard
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;daily_digest.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Agent] Report saved to daily_digest.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Entry point — run directly or trigger from a scheduler
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;schedule_daily_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Latest open-source LLM releases, AI agent frameworks, and humanoid robotics powered by AI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code is complete and runnable. Replace &lt;code&gt;your_api_key_here&lt;/code&gt; with a valid key from &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;xuedingmao.com&lt;/a&gt; and execute directly. To schedule it as a daily job on Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to crontab — runs every day at 9:00 AM&lt;/span&gt;
crontab &lt;span class="nt"&gt;-e&lt;/span&gt;
&lt;span class="c"&gt;# Add this line:&lt;/span&gt;
0 9 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; /usr/bin/python3 /path/to/your_script.py &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /var/log/ai_digest.log 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  5. Tool Selection: Development Platform Considerations
&lt;/h2&gt;

&lt;p&gt;When building agent workflows that make hundreds of LLM calls per task, the choice of API provider has direct implications for cost, latency, and maintainability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Xuedingmao AI&lt;/strong&gt; (&lt;code&gt;xuedingmao.com&lt;/code&gt;) is worth evaluating for this use case for the following technical reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregates 500+ mainstream models including GPT-5.5, Claude 4.8, and Gemini 3.1 Pro under a single endpoint, eliminating the need to maintain separate client configurations per provider&lt;/li&gt;
&lt;li&gt;New model releases are available through the same interface without requiring SDK updates or endpoint changes&lt;/li&gt;
&lt;li&gt;The unified OpenAI-compatible interface means existing code targeting one model can be redirected to another by changing a single &lt;code&gt;model&lt;/code&gt; parameter — critical for comparative testing&lt;/li&gt;
&lt;li&gt;Endpoint stability and response throughput are optimized for high-frequency agent workloads, reducing timeout failures in long-running pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams running multi-agent workflows where a single user task spawns 10–50 sequential or parallel LLM calls, the cost difference between providers compounds significantly. A model like M3 that is both capable and economical per token makes sustained agent operation feasible without aggressive output truncation.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Key Considerations and Common Pitfalls
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Context window utilization:&lt;/strong&gt; A 1M token window enables long-context reasoning, but input costs scale linearly with token count. For search-and-summarize agents, implement a relevance filtering step before passing retrieved content to the model to avoid unnecessary token spend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt quality determines output quality:&lt;/strong&gt; The MiniMax Code demos above produced strong results with well-structured prompts. Vague instructions produce mediocre outputs regardless of model capability. Always define the output format, scope, and success criteria explicitly in the system prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent verification loops:&lt;/strong&gt; Multi-agent pipelines that skip a verification step are prone to hallucinated sources or fabricated statistics, especially in research tasks. Build a dedicated verification sub-agent that cross-checks claims against raw search results before compiling the final report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled task monitoring:&lt;/strong&gt; Background tasks running without user oversight need logging and alerting. If a scheduled agent silently fails, the user has no output and no indication of the failure. Always write task logs to persistent storage and configure notification hooks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local vs. cloud deployment:&lt;/strong&gt; M3's open-source weights can be self-hosted for workflows requiring data privacy. However, local inference requires substantial VRAM for full-precision operation. Quantized variants (GGUF/AWQ) reduce hardware requirements with acceptable quality tradeoffs for most production tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Summary
&lt;/h2&gt;

&lt;p&gt;MiniMax M3 closes a meaningful portion of the performance gap between open-source and proprietary frontier models while offering a 1M token context window, native multimodality, and substantially lower inference costs. On its own, it is a capable model for code generation, UI development, and complex reasoning tasks.&lt;/p&gt;

&lt;p&gt;Paired with the MiniMax Code agentic workspace, it becomes a full workflow automation platform: capable of spawning multi-agent teams, running scheduled background tasks, processing files, and building persistent systems that operate independently of the user's active session. The practical result is an AI development environment that behaves less like a chat assistant and more like an autonomous technical collaborator — one that can be assigned real tasks and trusted to complete them with minimal supervision.&lt;/p&gt;

&lt;p&gt;For developers building production AI pipelines, the combination of capable open-source model weights, an agentic execution environment, and cost-efficient inference is a genuinely compelling stack worth evaluating.&lt;/p&gt;




&lt;p&gt;&lt;code&gt;#AI&lt;/code&gt; &lt;code&gt;#大模型&lt;/code&gt; &lt;code&gt;#Python&lt;/code&gt; &lt;code&gt;#机器学习&lt;/code&gt; &lt;code&gt;#技术实战&lt;/code&gt; &lt;code&gt;#Agent&lt;/code&gt; &lt;code&gt;#工作流自动化&lt;/code&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>【Deep Analysis】Claude Fable 5 vs. Mythos 5: What Anthropic Actually Shipped and What It Cost You</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Fri, 12 Jun 2026 14:14:22 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/deep-analysis-claude-fable-5-vs-mythos-5-what-anthropic-actually-shipped-and-what-it-cost-you-3d59</link>
      <guid>https://dev.to/schrodingcatai/deep-analysis-claude-fable-5-vs-mythos-5-what-anthropic-actually-shipped-and-what-it-cost-you-3d59</guid>
      <description>&lt;h2&gt;
  
  
  1. Background: The Model Launch Fatigue Problem
&lt;/h2&gt;

&lt;p&gt;Every few weeks, another frontier lab ships a new model and declares it the most capable release in company history. Developers are left parsing benchmark charts, trying to determine whether anything substantively changed or whether they are simply looking at a rebranding exercise paired with a larger invoice.&lt;/p&gt;

&lt;p&gt;The Claude Fable 5 launch on June 9 is different — not because the benchmark numbers are dramatic, but because of what Anthropic quietly admitted in its own release notes: this model was previously considered too risky to ship to the general public. Understanding that admission is the entire story. Everything else — benchmarks, pricing, context windows — is secondary to grasping what Anthropic actually decided to do and why it matters for production AI development.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Architecture: One Model, Two Access Tiers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 The Fable / Mythos Split
&lt;/h3&gt;

&lt;p&gt;Anthropic released two models simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Fable 5&lt;/strong&gt; — the broadly available production model, accessible via API, AWS Bedrock, Google Vertex AI, and Microsoft Azure AI Foundry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Mythos 5&lt;/strong&gt; — a restricted-access variant gated behind an approval program called &lt;em&gt;Project Glass Wing&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical technical fact: Fable 5 and Mythos 5 run on the &lt;strong&gt;same underlying model weights&lt;/strong&gt;. The only architectural difference is that Mythos 5 operates with certain safety constraints removed, inside a controlled trusted-access environment. Fable 5 is the version Anthropic determined is safe enough for general deployment.&lt;/p&gt;

&lt;p&gt;This framing matters for developers. When you call &lt;code&gt;claude-fable-5&lt;/code&gt;, you are not calling a dumbed-down consumer model — you are calling the publicly shippable cut of a more powerful core model, with safety guardrails applied at inference time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Automatic Model Switching Behavior
&lt;/h3&gt;

&lt;p&gt;A detail worth flagging for anyone building on the API: automatic model switching is &lt;strong&gt;enabled by default&lt;/strong&gt;. When Fable 5 encounters a request it determines requires a capability outside its permitted operating envelope, it can silently fall back to Opus. This is by design, configurable under Settings → Capabilities on first Fable selection, but it means your production logs may show model-level variance that is not a bug — it is intended routing behavior. Any API monitoring or cost attribution system should account for this.&lt;/p&gt;

&lt;p&gt;Additionally, prompts that attempt to extract the model's private reasoning chain can trigger a &lt;code&gt;reasoning_extraction&lt;/code&gt; refusal, which itself increases fallback frequency. Design your system prompts accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Capability Claims: What the Benchmarks Actually Say
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Anthropic's Official Position
&lt;/h3&gt;

&lt;p&gt;Anthropic's release documentation positions Fable 5 as state-of-the-art across nearly all tested benchmarks, with headline strengths in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Software engineering and long-horizon autonomous task completion&lt;/li&gt;
&lt;li&gt;First-shot correctness on well-specified complex problems&lt;/li&gt;
&lt;li&gt;Enterprise workflows: code review, debugging, ambiguity navigation&lt;/li&gt;
&lt;li&gt;Vision and multimodal scientific research tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are specific, coherent claims — not vague marketing assertions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 The Verification Gap
&lt;/h3&gt;

&lt;p&gt;The honest technical read requires one important caveat: &lt;strong&gt;every one of those benchmark numbers is Anthropic measuring Anthropic&lt;/strong&gt;. Independent third-party evaluations have not yet accumulated at the time of this article. "State of the art on nearly all tested benchmarks" is doing significant work in that sentence — particularly the word &lt;em&gt;tested&lt;/em&gt;. The ranking may well hold up under independent scrutiny, but as of now it reflects Anthropic's strongest internal showing, not a community-verified consensus ranking.&lt;/p&gt;

&lt;p&gt;Treat the capability claims as strong evidence, not settled fact, until external replication confirms them.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Practical Demo: Calling Fable 5 via the API
&lt;/h2&gt;

&lt;p&gt;The following example uses &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;Xuedingmao AI&lt;/a&gt; as the API gateway. The platform aggregates 500+ frontier models including Claude 4.8, GPT-5.5, and Gemini 3.1 Pro, provides a unified OpenAI-compatible interface, and offers first-availability access to newly released model APIs — reducing multi-model integration overhead significantly for production teams.&lt;/p&gt;

&lt;p&gt;Default model in this tutorial: &lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;  &lt;span class="c1"&gt;# Anthropic official SDK
&lt;/span&gt;
&lt;span class="c1"&gt;# ── Configuration ──────────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;          &lt;span class="c1"&gt;# Unified gateway base URL
&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;               &lt;span class="c1"&gt;# Replace with your actual API key
&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;                 &lt;span class="c1"&gt;# claude-opus-4-8: strong reasoning,
&lt;/span&gt;                                             &lt;span class="c1"&gt;# long-context handling, code generation
&lt;/span&gt;
&lt;span class="c1"&gt;# ── Initialize client with custom base URL ─────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                       &lt;span class="c1"&gt;# Route through aggregation gateway
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ── Build a long-horizon autonomous task prompt ────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer performing a code review.
Identify logic errors, security vulnerabilities, and performance bottlenecks.
Return structured findings: severity, location, explanation, recommended fix.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;user_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
def get_user_data(user_id):
    query = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE id = &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; + user_id   # Potential SQL injection
    result = db.execute(query)
    return result[0]                                       # No null-check
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# ── API call ───────────────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                         &lt;span class="c1"&gt;# Sufficient for detailed code review output
&lt;/span&gt;    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;# System-level instruction for role framing
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review the following Python function:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
python&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ── Output ─────────────────────────────────────────────────────────────────────
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Code Review Result ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;              &lt;span class="c1"&gt;# Primary text response block
&lt;/span&gt;
&lt;span class="c1"&gt;# Token usage — important for cost tracking with the new tokenizer
&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Token Usage] Input: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | Output: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Estimated Cost] $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token usage reporting is deliberate — given the new tokenizer behavior described in Section 5, tracking per-call token counts in production is no longer optional.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Critical Caveats: What Anthropic's Marketing Slides Omit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 The Tokenizer Tax
&lt;/h3&gt;

&lt;p&gt;Anthropic's own release notes state that the same input text can produce &lt;strong&gt;approximately 30% more tokens&lt;/strong&gt; on Fable 5 than on models prior to Opus 4.7. This is not a rounding error — it is a structural cost multiplier.&lt;/p&gt;

&lt;p&gt;At $10 per million input tokens and $50 per million output tokens, Fable 5 is already roughly double the price of Opus 4.8 at the nominal per-token rate. Layer the 30% tokenizer inflation on top of that, and the effective cost premium over older models is materially wider than the headline rate comparison suggests. Any budget projection based solely on per-token pricing without accounting for the new tokenizer will underestimate actual spend.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 The Context Window Asterisk
&lt;/h3&gt;

&lt;p&gt;Anthropic's developer documentation states that Fable 5 supports a &lt;strong&gt;1 million token context window&lt;/strong&gt; by default via the API and in Claude Code. This claim is accurate — within those specific surfaces.&lt;/p&gt;

&lt;p&gt;Consumer help pages describe different, surface-specific limits: certain Opus and Sonnet configurations in the paid consumer app are capped at 500K or 200K tokens depending on the usage context. The 1M figure is not universal across all product surfaces.&lt;/p&gt;

&lt;p&gt;For API developers and Claude Code users: the 1M context is real and usable. For anyone building features that depend on long-context behavior in the consumer chat interface: verify the actual limit for your specific surface before promising it downstream. This is exactly the category of specification that gets repeated incorrectly across the internet within days of a launch.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Tool and Platform Selection Notes
&lt;/h2&gt;

&lt;p&gt;For developers integrating Fable 5 or evaluating it against alternatives, &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;Xuedingmao AI&lt;/a&gt; provides a practical aggregation layer worth considering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregates 500+ models including Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, and newly released frontier models available at first launch&lt;/li&gt;
&lt;li&gt;Exposes a unified OpenAI-compatible &lt;code&gt;/v1/messages&lt;/code&gt; endpoint, eliminating per-provider SDK integration overhead&lt;/li&gt;
&lt;li&gt;Delivers stable, low-latency responses suitable for both production throughput and iterative development testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams running model routing logic (e.g., sending complex tasks to Fable 5, routine tasks to Sonnet 4.6), a unified interface simplifies the switching layer considerably.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Common Pitfalls and Deployment Recommendations
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do not default everything to Fable 5.&lt;/strong&gt; The cost structure — doubled per-token rate plus 30% tokenizer inflation plus increased output volume from extended reasoning — compounds quickly at scale. Route to Fable 5 only when the task complexity justifies it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Fable 5 earns its price:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long, multi-step autonomous workflows with minimal human checkpoints&lt;/li&gt;
&lt;li&gt;High-stakes code review, architecture analysis, or security audits where a missed issue costs hours of remediation&lt;/li&gt;
&lt;li&gt;Complex multimodal inputs requiring simultaneous vision and reasoning&lt;/li&gt;
&lt;li&gt;Research synthesis across large, ambiguous document corpora&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When Opus 4.8 or Sonnet 4.6 are the right call:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured extraction, classification, and summarization tasks&lt;/li&gt;
&lt;li&gt;High-volume, low-complexity API workloads where throughput and cost matter&lt;/li&gt;
&lt;li&gt;Rapid prototyping and iterative development where output quality differences are marginal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monitor model switching in production.&lt;/strong&gt; If you have SLA requirements or cost attribution systems tied to a specific model, the default automatic switching behavior must be explicitly managed. Log &lt;code&gt;model&lt;/code&gt; from response metadata, not just your request parameter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not expose reasoning extraction prompts in demos.&lt;/strong&gt; Prompts designed to surface internal chain-of-thought can trigger refusal responses, which increases fallback frequency and skews cost metrics in live demonstrations.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Summary
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 represents something more significant than a routine capability increment. Anthropic shipped a public version of a model it previously categorized as too risky to release, applying safety constraints at the system level rather than through model capability reduction. That is a meaningful architectural and policy decision, independent of any benchmark number.&lt;/p&gt;

&lt;p&gt;The practical takeaway for developers is a two-part framework: first, validate capability claims against independent evaluations as they emerge rather than treating Anthropic's internal benchmarks as the final word; second, build cost models that account for the new tokenizer's 30% inflation factor and the surface-specific context window limits before committing to Fable 5 as a default. Used selectively on hard, high-value problems, Fable 5 is a serious frontier tool. Used indiscriminately as a drop-in replacement for cheaper models, it will surface painfully on the billing dashboard without proportional gains in output quality.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#AI #LargeLanguageModels #Python #MachineLearning #TechnicalPractice&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>【深度解析】Anthropic Claude Fable 5&amp; Mythos 5: Architecture, Benchmarks, and the Agentic Deployment Strategy You Need to Know</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Thu, 11 Jun 2026 14:28:09 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/shen-du-jie-xi-anthropic-claude-fable-5-mythos-5-architecture-benchmarks-and-the-agentic-deployment-okl</link>
      <guid>https://dev.to/schrodingcatai/shen-du-jie-xi-anthropic-claude-fable-5-mythos-5-architecture-benchmarks-and-the-agentic-deployment-okl</guid>
      <description>&lt;p&gt;&lt;strong&gt;Abstract:&lt;/strong&gt; Anthropic simultaneously released two models — Claude Fable 5 and Claude Mythos 5 — sharing the same underlying architecture yet deployed under fundamentally different access tiers. This article dissects their core technical differences, analyzes independent benchmark results, explains the philosophy behind adaptive thinking and opaque chain-of-thought, and provides actionable workflow engineering guidance for developers building on top of frontier agentic models.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Background: Why Two Models With Nearly the Same Name?
&lt;/h2&gt;

&lt;p&gt;Every major model launch in 2024–2025 arrives wrapped in the same three words: "state of the art." Parsing signal from marketing noise has become a skill in itself. Anthropic's June 9th release is a genuinely unusual case — they dropped not one but two models with near-identical naming: &lt;strong&gt;Claude Fable 5&lt;/strong&gt; and &lt;strong&gt;Claude Mythos 5&lt;/strong&gt;. The marketing barely explains the difference.&lt;/p&gt;

&lt;p&gt;The industry pain point here is real. Developers need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this a capability leap or a branding refresh?&lt;/li&gt;
&lt;li&gt;Why does access differ so significantly between the two variants?&lt;/li&gt;
&lt;li&gt;What does this tell us about where frontier AI deployment is heading?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer reveals something more structurally interesting than a typical model release — Anthropic is treating its deployment strategy as a product in its own right.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Architecture: Same Model, Two Deployment Modes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Shared Foundation
&lt;/h3&gt;

&lt;p&gt;According to Anthropic's own documentation, Fable 5 and Mythos 5 are &lt;strong&gt;the same underlying model&lt;/strong&gt;. Not a distilled version. Not a smaller variant. Identical weights, two deployment configurations.&lt;/p&gt;

&lt;p&gt;Both models share the following specifications:&lt;/p&gt;

&lt;p&gt;| Parameter | Value |&lt;br&gt;
|---|&lt;br&gt;
| Context Window | 1,000,000 tokens |&lt;br&gt;
| Max Output | 128,000 tokens |&lt;br&gt;
| Adaptive Thinking | Always-on, non-toggleable |&lt;br&gt;
| Chain-of-Thought Access | Summarized only — raw CoT not exposed |&lt;/p&gt;
&lt;h3&gt;
  
  
  2.2 The Deployment Tier Difference
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Claude Fable 5&lt;/strong&gt; ships with built-in safety classifiers and fallback behavior. It is publicly available and broadly accessible. The constraints are baked in at the platform level, not retrofitted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Mythos 5&lt;/strong&gt; is the less-constrained deployment variant, gated behind a program called &lt;strong&gt;Project Glass Wing&lt;/strong&gt;, currently scoped to veted cybersecurity researchers and select biology research partners.&lt;/p&gt;

&lt;p&gt;This is not a routine chatbot refresh. Anthropic is commercializing a capability tier it previously assessed as too risky to distribute openly — and doing so through a controlled, monitored access layer rather than a public API endpoint.&lt;/p&gt;
&lt;h3&gt;
  
  
  2.3 The Chain-of-Thought Design Choice
&lt;/h3&gt;

&lt;p&gt;One architectural decision that most coverage glosses over: these models &lt;strong&gt;never return raw chain-of-thought&lt;/strong&gt;. If you want reasoning visibility, Anthropic's recommended approach is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarized thinking traces&lt;/li&gt;
&lt;li&gt;Tool call traces with explicit verification steps&lt;/li&gt;
&lt;li&gt;Verifier sub-agent patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a deliberate philosophy: &lt;strong&gt;extremely capable, highly agentic, but instrumented and fenced at the platform level&lt;/strong&gt;. That design choice carries direct implications for how you build on top of it.&lt;/p&gt;


&lt;h2&gt;
  
  
  3. Benchmark Analysis: What Independent Testing Actually Shows
&lt;/h2&gt;
&lt;h3&gt;
  
  
  3.1 CursorBench 3.1 — Real-World Coding Tasks
&lt;/h3&gt;

&lt;p&gt;The strongest independent data point comes from Cursor's &lt;strong&gt;CursorBench 3.1&lt;/strong&gt;, a benchmark built from real, mesy, multi-file coding sessions rather than academic trivia.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;CursorBench 3.1 Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Fable 5 (max)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.9%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8 (max)&lt;/td&gt;
&lt;td&gt;63.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7 (max)&lt;/td&gt;
&lt;td&gt;64.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5 Extra&lt;/td&gt;
&lt;td&gt;&amp;lt; Fable 5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is a meaningful gap. The benchmark rewards sustained multi-file reasoning, ambiguity handling, and single-pass correctness — exactly the capabilities Anthropic claims to have improved.&lt;/p&gt;
&lt;h3&gt;
  
  
  3.2 Where It Falls Short
&lt;/h3&gt;

&lt;p&gt;Fable 5 is not human-parity code. Rabbits' review found it noisier and less precise than Opus 4.8 for targeted code review tasks specifically. The cybersecurity results from AI Eyes are striking, but their own team acknowledges those results do not establish real-world dominance.&lt;/p&gt;

&lt;p&gt;The honest read: &lt;strong&gt;best-in-class for long-horizon agentic coding, not a clean sweep across all coding subtasks&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  4. Practical Implementation: Workflow Engineering for Fable 5
&lt;/h2&gt;
&lt;h3&gt;
  
  
  4.1 How to Structure Long-Horizon Agentic Tasks
&lt;/h3&gt;

&lt;p&gt;Getting value from Fable 5 requires thinking like a &lt;strong&gt;workflow engineer&lt;/strong&gt;, not a prompt tinkerer. The model rewards structured scaffolding. Here is a pattern for running a multi-step agentic task using the &lt;code&gt;claude-opus-4-8&lt;/code&gt; model via the Xuedingmao AI unified API endpoint — the same interface pattern applies to Fable 5 as capabilities roll out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;#=============================================
# Configuration
# Model: claude-opus-4-8
# Platform: Xuedingmao AI (xuedingmao.com)
# BASE_URL: https://xuedingmao.com
# Endpoint: /v1/messages
# =============================================
&lt;/span&gt;
&lt;span class="c1"&gt;# Initialize the Anthropic client, pointing to the unified aggregation endpoint.
# Xuedingmao AI aggregates 500+ frontier models (GPT-5.5, Claude 4.8, Gemini 3.1 Pro, etc.)
# under a single OpenAI-compatible interface — no need to adapt to each model's native API.
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;# Replace with your Xuedingmao AI API key
&lt;/span&gt;    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Unified gateway; stable, low-latency, production-ready
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agentic_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Runs a structured agentic workflow with:
    1. Explicit sub-task decomposition
    2. Grounded progress verification against tool results
    3. Scaffold memory injected as system context
    4. Summarized thinking for reasoning visibility (Fable 5 pattern)

    Args:
        task_description: High-level task string, should be specific and bounded
        tool_results: List of prior tool call results from this session for grounding

    Returns:
        dict containing the model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s structured response and verification status
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Build scaffold memory context from prior tool results.
&lt;/span&gt;    &lt;span class="c1"&gt;# This pattern prevents the model from hallucinating progress claims —
&lt;/span&gt;    &lt;span class="c1"&gt;# it must verify each step against actual tool outputs from the session.
&lt;/span&gt;    &lt;span class="n"&gt;scaffold_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tool_results&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No prior tool results.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# System prompt engineering for long-horizon agentic runs.
&lt;/span&gt;    &lt;span class="c1"&gt;# Key constraints:
&lt;/span&gt;    &lt;span class="c1"&gt;# - Break task into explicit, numbered sub-steps before executing
&lt;/span&gt;    &lt;span class="c1"&gt;# - Verify each progress claim against the tool_results provided
&lt;/span&gt;    &lt;span class="c1"&gt;# - Surface partial results without prematurely terminating the run
&lt;/span&gt;    &lt;span class="c1"&gt;# - If ambiguous, request clarification rather than assuming
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an expert software engineering agent.

TASK CONTEXT:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

PRIOR SESSION TOOL RESULTS (ground all progress claims against these):
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;scaffold_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

OPERATING RULES:
1. Decompose the task into explicit numbered sub-steps before starting execution.
2. After each sub-step, verify completion against the tool results above.
3. Do NOT claim a step is complete unless the tool result confirms it.
4. Surface intermediate results in structured JSON rather than prose.
5. If you encounter ambiguity, stop and ask a clarifying question.
6. Prefer single-pass correctness over speed — do not cut corners.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# API call — using /v1/messages endpoint (Anthropic-compatible format)
&lt;/span&gt;    &lt;span class="c1"&gt;# max_tokens set high to accommodate128K output budget on Fable 5 class models
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Swap to claude-fable-5 when available on the platform
&lt;/span&gt;        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;              &lt;span class="c1"&gt;# Adjust based on expected output complexity
&lt;/span&gt;        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Begin execution. Task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract and structure the response for downstream verification
&lt;/span&gt;    &lt;span class="n"&gt;raw_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# Check for 'end_turn' vs 'max_tokens'
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;raw_output&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="c1"&gt;# =============================================
# Example usage: multi-file refactoring task
# =============================================
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor the authentication module across three files: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth.py, middleware.py, and routes.py. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Replace all MD5 password hashing with bcrypt. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ensure backward compatibility for existing sessions. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return a diff summary for each file.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Simulate prior tool results from the session (e.g., from a file-reading tool call)
&lt;/span&gt;    &lt;span class="n"&gt;prior_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;142&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;middleware.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;87&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;routes.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lines&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;210&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agentic_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prior_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens used: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in / &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output_tokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; out&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stop reason: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stop_reason&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;== Agent Output ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Four Workflow Engineering Principles for Long Runs
&lt;/h3&gt;

&lt;p&gt;When building on Fable 5-class agentic models, apply these four structural principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Force explicit sub-task decomposition&lt;/strong&gt; before execution begins. Models that plan first complete more reliably in a single pass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constrain autonomy with explicit boundaries&lt;/strong&gt; — define what the model should stop and escalate rather than letting it infer scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Force grounded progress reporting&lt;/strong&gt; — require the model to verify each progress claim against actual tool results from the current session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide real scaffolding memory&lt;/strong&gt; via a verifier sub-agent that can surface partial results without terminating the run.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. Tool and Platform Selection
&lt;/h2&gt;

&lt;p&gt;For developers integrating Fable 5 or other frontier models into production workflows, &lt;strong&gt;Xuedingmao AI (xuedingmao.com)&lt;/strong&gt; is worth evaluating as a unified API gateway.&lt;/p&gt;

&lt;p&gt;From a technical standpoint, the platform aggregates 500+ mainstream large models — including GPT-5.5, Claude 4.8, and Gemini 3.1 Pro — under a single OpenAI-compatible interface. New models are made available at launch, giving developers first-access to frontier API capabilities. The unified &lt;code&gt;/v1/messages&lt;/code&gt; endpoint eliminates the need to maintain separate integration adapters for each provider's native API, which meaningfully reduces multi-model integration complexity in production codebases. Interface stability and response latency are well-suited for high-throughput and iterative testing scenarios.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Pitfalls and Operational Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  6.1 When Fable 5 is NOT the Right Default
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-volume, low-latency use cases&lt;/strong&gt;: Fable 5 is slower and more expensive per token than Sonet or Haiku tier models. If your bottleneck is speed or cost rather than reasoning depth, Sonet-tier remains the smarter default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision code review&lt;/strong&gt;: Independent testing found Fable 5 noisier than Opus 4.8 for targeted code review. Use Fable 5 for agentic execution, not fine-grained review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expecting Mythos-level behavior from the public model&lt;/strong&gt;: The public Fable 5 is enginered specifically to behave differently from the Mythos access tier in sensitive domains. That is not a bug — it is the product working as designed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6.2 The Safety Architecture You Cannot Override
&lt;/h3&gt;

&lt;p&gt;The classifiers, fallbacks, and trusted access gating in Fable 5 are not toggleable. If your use case involves offensive security research or sensitive biological data, the appropriate access path is through Project Glass Wing, not prompt engineering around Fable 5's constraints.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Summary
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 and Mythos 5 are not the arrival of AGI. They are a clear signal that frontier labs are now shipping &lt;strong&gt;work models&lt;/strong&gt; rather than answer models. The real story is not that the model got stronger — it is that &lt;strong&gt;the deployment strategy has become the product itself&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Fable 5 is elite for agentic coding, ambiguous long-horizon knowledge work, multimodal professional tasks, and high-autonomy research workflows. Its 1M token context, 128K output budget, and improved sub-agent coordination represent a genuine capability step. The tradeoffs — slower, more expensive, operationally demanding — are equally real.&lt;/p&gt;

&lt;p&gt;Getting value from this generation of models requires workflow engineering discipline: explicit decomposition, grounded verification, scaffold memory, and structured escalation patterns. Developers who invest in that infrastructure will extract substantially more value than those treating it as a smarter autocomplete.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#AI #大模型 #Python #机器学习 #技术实战 #ClaudeAgenticAI&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>【Deep Dive】Frontier Code: The Benchmark That Asks "Would a Maintainer Merge This?"</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Tue, 09 Jun 2026 14:40:25 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/deep-dive-frontier-code-the-benchmark-that-asks-would-a-maintainer-merge-this-4m0l</link>
      <guid>https://dev.to/schrodingcatai/deep-dive-frontier-code-the-benchmark-that-asks-would-a-maintainer-merge-this-4m0l</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;Cognition's Frontier Code benchmark reframes how we evaluate AI coding capability. Instead of asking "does the code pass tests?", it asks a harder question: would an experienced maintainer actually approve this pull request? This article breaks down the benchmark's design, scoring methodology, key results, and what it means for the next generation of coding agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Background: Why Passing Tests Isn't Enough
&lt;/h2&gt;

&lt;p&gt;Most coding benchmarks operate on a binary signal: does the generated code pass the test suite? This is a useful proxy, but it conflates functional correctness with production quality — and those are not the same thing.&lt;/p&gt;

&lt;p&gt;A patch can pass every available test and still be rejected in a real code review. Common reasons include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Overly broad scope&lt;/strong&gt; — touching files unrelated to the issue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weak or superficial tests&lt;/strong&gt; — covering the happy path but missing edge cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style violations&lt;/strong&gt; — ignoring local conventions, naming patterns, or idioms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Poor abstraction&lt;/strong&gt; — solving the immediate problem in a way that makes future changes harder or introduces hidden coupling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are exactly the criteria experienced maintainers apply when reviewing pull requests. Cognition's Frontier Code benchmark is a direct attempt to operationalize this standard: measuring &lt;em&gt;mergeability&lt;/em&gt;, not just functional correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Design: Three Nested Subsets and Two Metrics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dataset Structure
&lt;/h3&gt;

&lt;p&gt;Frontier Code organizes its tasks into three nested subsets:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Subset&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Extended&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;154 tasks&lt;/td&gt;
&lt;td&gt;Full benchmark, includes easier tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Main&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100 tasks&lt;/td&gt;
&lt;td&gt;The 100 hardest tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Diamond&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50 tasks&lt;/td&gt;
&lt;td&gt;The 50 hardest tasks — strictest subset&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you see results reported on Diamond, you're looking at the most demanding evaluation tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scoring Methodology
&lt;/h3&gt;

&lt;p&gt;The benchmark reports two primary metrics:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pass Rate&lt;/strong&gt; — Binary. A solution passes only if it clears every &lt;em&gt;blocker criterion&lt;/em&gt;. Blockers are conditions a maintainer would treat as hard stops in a real review. If any single blocker fails, the entire attempt fails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Score&lt;/strong&gt; — A weighted aggregate across all rubric items. Critically, if the solution fails any blocker criterion, the score is automatically set to zero. This means score is not a consolation prize for partial effort — it only becomes meaningful after mandatory mergeability checks are cleared.&lt;/p&gt;

&lt;p&gt;Each model is run five times at every available reasoning effort level. Results are averaged per effort level, and the headline chart reports the best-performing effort setting for each model. This means the chart is showing per-model optimal performance, not a fixed configuration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Diamond Subset (Hardest 50 Tasks)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Pass Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8&lt;/td&gt;
&lt;td&gt;13.4%&lt;/td&gt;
&lt;td&gt;14.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;6.3%&lt;/td&gt;
&lt;td&gt;7.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;5.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;4.7%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4 Mini&lt;/td&gt;
&lt;td&gt;4.6%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;3.8%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The leading result — 14.5% pass rate — is the whole point. The Diamond subset is far from saturated. Even the best available model solves only a small fraction of these tasks by the mergeability standard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Main Subset (100 Tasks)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Pass Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8&lt;/td&gt;
&lt;td&gt;34.3%&lt;/td&gt;
&lt;td&gt;37.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;25.5%&lt;/td&gt;
&lt;td&gt;28.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;43.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.6&lt;/td&gt;
&lt;td&gt;37.0%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4 Mini&lt;/td&gt;
&lt;td&gt;36.0%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;34.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Numbers are higher on the full set, and rankings shift somewhat, but Claude Opus 4.8 maintains the lead at the top. The compression of scores across models on Main indicates the task difficulty gradient is doing real work.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Concrete Example: The Subtle Failure Case
&lt;/h2&gt;

&lt;p&gt;The benchmark's purpose becomes clearest through a concrete task. Consider a C++ repository called &lt;code&gt;json_schema&lt;/code&gt;. The task:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new &lt;code&gt;log_warning&lt;/code&gt; helper function that always prints to &lt;code&gt;stderr&lt;/code&gt;, works even without debug flags enabled, and automatically prepends a warning prefix.&lt;/li&gt;
&lt;li&gt;Replace every existing warning message in the codebase with calls to this new helper.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This sounds like a straightforward refactor. But here's where Claude Opus 4.8 fails:&lt;/p&gt;

&lt;p&gt;It correctly updates the first line of multi-line warning blocks to use &lt;code&gt;log_warning&lt;/code&gt;, but leaves the continuation lines writing directly to &lt;code&gt;stderr&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Today, the output is identical.&lt;/strong&gt; The behavior appears correct. Tests pass.&lt;/p&gt;

&lt;p&gt;But the abstraction is broken. The call site is now implicitly assuming that &lt;code&gt;log_warning&lt;/code&gt; and direct &lt;code&gt;stderr&lt;/code&gt; writes are permanently equivalent. If &lt;code&gt;log_warning&lt;/code&gt; is later updated to route output elsewhere, add metadata, or change formatting — those continuation lines become wrong, and the bug is subtle and easy to miss.&lt;/p&gt;

&lt;p&gt;The benchmark correctly marks this as a quality failure, even though the current behavior is functionally correct. This is precisely the kind of issue that surfaces in real code review and gets flagged by an experienced maintainer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: what the model produced (subtly broken)
&lt;/span&gt;&lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;some_function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;log_warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Multi-line warning starts here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cerr&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  continuation line 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;BAD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bypasses&lt;/span&gt; &lt;span class="n"&gt;abstraction&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;cerr&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  continuation line 2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;endl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;BAD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bypasses&lt;/span&gt; &lt;span class="n"&gt;abstraction&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Example: what a correct refactor looks like
&lt;/span&gt;&lt;span class="n"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;some_function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;log_warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Multi-line warning starts here&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  continuation line 1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  continuation line 2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The distinction is not about today's output. It's about whether the code respects the abstraction boundary being established.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rubric Pipeline: Why Evaluation Is Expensive
&lt;/h2&gt;

&lt;p&gt;Frontier Code's evaluation pipeline involves five stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task Creation&lt;/strong&gt; — Contributors write tasks based on real open-source repositories, defining blocker criteria and rubric items.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Initial Review&lt;/strong&gt; — A pod lead reviews the task for clarity and fairness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adversarial Testing&lt;/strong&gt; — Authors attempt to find rubric edge cases and ambiguities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lead Review&lt;/strong&gt; — An experienced engineering lead iterates with the contributor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research Review&lt;/strong&gt; — A Cognition researcher does a final audit, and researchers solve the tasks themselves to verify that instructions are clear and grading is fair.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Tasks can be sent back for revision at any point in this loop. This level of rigor is why the benchmark is difficult to replicate externally — and also why the evaluation is expensive to build and maintain.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Demo: Evaluating Code Quality with Claude Opus 4.8
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.8 is the top-performing model on Frontier Code. It's Anthropic's most capable coding model at time of writing — strong at multi-step reasoning, context-aware refactoring, and following nuanced style constraints across large codebases.&lt;/p&gt;

&lt;p&gt;The following example demonstrates how to use the model for a production-quality code review task, using the OpenAI-compatible API provided by &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;Xueding Mao AI (xuedingmao.com)&lt;/a&gt; — an aggregation platform I use in day-to-day development work that provides unified access to 500+ frontier models including Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, with new models available immediately on release.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Code Quality Review with Claude Opus 4.8
Uses OpenAI-compatible API via xuedingmao.com

Requirements:
    pip install openai
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize client using xuedingmao.com's OpenAI-compatible endpoint
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# Get your key at xuedingmao.com
&lt;/span&gt;    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# --- Prompt Design ---
# The system prompt establishes the maintainer perspective.
# This mirrors the evaluation standard Frontier Code uses.
&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer conducting a production code review.
Evaluate the provided patch not just for functional correctness, but for mergeability.

Assess the following dimensions:
1. Scope correctness — Does the change touch only what&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s necessary?
2. Abstraction quality — Are boundaries respected and future-proof?
3. Test adequacy — Are the tests meaningful, not just coverage padding?
4. Style and idiom conformance — Does the code match local conventions?
5. Maintainability — Will this change make the codebase easier or harder to work with going forward?

For each dimension, provide a verdict (PASS / WARN / FAIL) and a brief explanation.
If any dimension is FAIL, the overall verdict is REJECT.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Submit a code patch for maintainer-style review.

    Args:
        original_code: The relevant section of the original codebase.
        patch: The proposed change to be reviewed.
        task_description: The original task or issue the patch addresses.

    Returns:
        Structured review output from Claude Opus 4.8.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;user_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;## Task Description
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

## Original Code
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
cpp&lt;br&gt;
{original_code}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
## Proposed Patch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
cpp&lt;br&gt;
{patch}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Please provide a structured code review evaluating mergeability."""

    response = client.chat.completions.create(
        model="claude-opus-4-8",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        temperature=0.2,     # Low temperature for consistent, analytical output
        max_tokens=2048,
    )

    return response.choices[0].message.content


# --- Example: The json_schema warning refactor task ---

original = """
void validate_type(const std::string&amp;amp; input) {
    if (input.empty()) {
        std::cerr &amp;lt;&amp;lt; "WARNING: " &amp;lt;&amp;lt; "Input is empty." &amp;lt;&amp;lt; std::endl;
        std::cerr &amp;lt;&amp;lt; "  Defaulting to null type." &amp;lt;&amp;lt; std::endl;
    }
}
"""

# This is the subtly broken patch — first line uses log_warning,
# continuation writes directly to stderr.
broken_patch = """
void validate_type(const std::string&amp;amp; input) {
    if (input.empty()) {
        log_warning("Input is empty.");
        std::cerr &amp;lt;&amp;lt; "  Defaulting to null type." &amp;lt;&amp;lt; std::endl;  // abstraction leak
    }
}
"""

task = """
Create a log_warning() helper that always writes to stderr with a WARNING prefix.
Replace all existing warning messages in the codebase to use this helper.
"""

if __name__ == "__main__":
    review = review_patch(original, broken_patch, task)
    print("=== Code Review Result ===\n")
    print(review)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
markdown&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected output structure from the model:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== Code Review Result ===

## Scope Correctness — PASS
The change is limited to the relevant function and introduces the new helper as specified.

## Abstraction Quality — FAIL
The patch uses log_warning() for the first line but writes subsequent lines directly 
to std::cerr. This breaks the abstraction boundary. If log_warning() is later updated 
to redirect output or add structured metadata, the continuation lines will diverge 
silently. All lines of a logical warning block must flow through the same abstraction.

## Test Adequacy — WARN
No tests were provided for the new helper function. The refactor should be accompanied 
by at least a basic test verifying that log_warning() writes to stderr with the correct prefix.

## Style Conformance — PASS
Naming and formatting match local conventions.

## Maintainability — FAIL
The mixed abstraction creates a hidden assumption that will cause maintenance debt.

## Overall Verdict: REJECT
Critical abstraction violation in continuation line handling. Recommend consolidating 
all warning lines through log_warning() before merging.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is exactly the kind of reasoning Frontier Code is trying to measure — and it demonstrates why test-passing alone is an insufficient benchmark target.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations to Keep in Mind
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tasks are not public.&lt;/strong&gt; Cognition has kept the task set private to avoid benchmark contamination. This is reasonable, but it means external researchers cannot fully audit every rubric item. Treat Frontier Code as a useful signal, not a definitive universal ranking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scores reflect model + tooling + scaffolding.&lt;/strong&gt; The benchmark uses agent harnesses, so results capture the full stack, not the model in isolation. A different harness configuration may produce different numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt-based grading has drift risk.&lt;/strong&gt; Subjective rubric evaluation can measure things that unit tests cannot, but it requires strong quality control to stay consistent. Cognition's five-stage pipeline is designed to address this, but it's worth keeping in mind when comparing results across time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The takeaway from Frontier Code is not "use model X." That framing is too simplistic. The more important signal is structural: &lt;strong&gt;code quality is becoming the next bottleneck for coding agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Passing tests was a reasonable first benchmark target. But as models get better at generating functional code, the constraint shifts. Production codebases require changes that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scoped&lt;/strong&gt; — minimal blast radius, touch only what's necessary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintainable&lt;/strong&gt; — respect existing abstractions, don't create hidden coupling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idiomatic&lt;/strong&gt; — follow local conventions, not just syntactic correctness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adequately tested&lt;/strong&gt; — meaningful coverage, not coverage theater&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Acceptable to maintainers&lt;/strong&gt; — the humans who own the codebase have to live with this change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on current results, no model is close to satisfying all of these criteria reliably. The Diamond subset — 50 carefully constructed, real-repository tasks — has a best pass rate of 14.5%. That's not a benchmark being saturated. That's a benchmark doing its job.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Frontier Code is a serious attempt to close the gap between "AI that generates code" and "AI that generates code a maintainer would actually merge." The scoring design, rubric pipeline, and concrete failure examples all point in the same direction: functional correctness is necessary but not sufficient. The field needs benchmarks that measure what production software development actually demands.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #AI #LLM #CodeReview #SoftwareEngineering #Benchmark #Python #CodingAgents #ClaudeOpus #FrontierCode&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>programming</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>【技术干货】DeepSeek Desktop Agent: A Free, Open-Source Alternative to Codex and Claude Code</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Mon, 08 Jun 2026 15:02:54 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/ji-zhu-gan-huo-deepseek-desktop-agent-a-free-open-source-alternative-to-codex-and-claude-code-pb6</link>
      <guid>https://dev.to/schrodingcatai/ji-zhu-gan-huo-deepseek-desktop-agent-a-free-open-source-alternative-to-codex-and-claude-code-pb6</guid>
      <description>&lt;h2&gt;
  
  
  Abstract
&lt;/h2&gt;

&lt;p&gt;The AI agent landscape is evolving rapidly, with major providers shipping proprietary coding platforms at premium prices. This article walks through &lt;strong&gt;DeepSeek GUI&lt;/strong&gt; — a community-built, open-source desktop agent that brings Codex-like capabilities to your local machine, powered by DeepSeek's ultra-cheap API. We cover setup, architecture, key features like persistent agents and MCP plugin support, and provide a production-ready Python integration example.&lt;/p&gt;




&lt;h2&gt;
  
  
  Background: The Rise of AI Coding Agents
&lt;/h2&gt;

&lt;p&gt;Every major AI provider now ships its own agent platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex&lt;/strong&gt; — evolving into a full AI coding agent platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude Code&lt;/strong&gt; — widely regarded as one of the strongest coding harnesses available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google's Gemini CLI&lt;/strong&gt; — repositioned as a solo developer workspace&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools share a common pattern: they don't replace code review or human judgment — they act as an &lt;strong&gt;additional layer of defense&lt;/strong&gt;, catching issues that might slip through traditional review cycles. The challenge is cost and lock-in. Most are tied to expensive proprietary APIs with opaque pricing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek GUI&lt;/strong&gt; changes that equation entirely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important disclaimer:&lt;/strong&gt; DeepSeek GUI is an independent open-source project built by a community developer. It is not an official DeepSeek product. Evaluate it accordingly for enterprise use.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Core Architecture and Design Philosophy
&lt;/h2&gt;

&lt;p&gt;DeepSeek GUI is an &lt;strong&gt;Electron-based desktop application&lt;/strong&gt; with a parallel web interface. Its architecture mirrors tools like Codex in terms of UX, but runs on DeepSeek's API — which, as we'll demonstrate, costs fractions of a cent per complex task.&lt;/p&gt;

&lt;p&gt;Key architectural decisions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Desktop shell&lt;/td&gt;
&lt;td&gt;Electron (cross-platform)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web fallback&lt;/td&gt;
&lt;td&gt;Local browser via &lt;code&gt;localhost&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent runtime&lt;/td&gt;
&lt;td&gt;Node.js 20+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model backend&lt;/td&gt;
&lt;td&gt;DeepSeek API (OpenAI-compatible)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin system&lt;/td&gt;
&lt;td&gt;MCP (Model Context Protocol)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; integration is particularly significant — it's the same protocol used by Claude's tooling ecosystem, which means external tools, custom skills, and structured data sources can be wired in using a standardized interface.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites and Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  System Requirements
&lt;/h3&gt;

&lt;p&gt;Before starting, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js 20 or higher&lt;/strong&gt; (the runtime requirement is strict — older versions will fail silently)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;paid DeepSeek API key&lt;/strong&gt; (free tier does not expose the full model API)&lt;/li&gt;
&lt;li&gt;Internet access for initial dependency resolution&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Installation from Source
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https://github.com/deepc-gui/deepc-gui.git

&lt;span class="c"&gt;# Navigate into the project directory&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;deepc-gui

&lt;span class="c"&gt;# Install all dependencies (requires internet on first run)&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Start the development server&lt;/span&gt;
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once running, you'll see a &lt;code&gt;localhost&lt;/code&gt; URL in the terminal output. You can either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open that URL in your browser for the &lt;strong&gt;web interface&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Electron desktop app&lt;/strong&gt; directly (recommended for full feature access)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On first launch, the settings panel will prompt you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set your &lt;strong&gt;UI theme and language&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Input your &lt;strong&gt;DeepSeek API key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Optionally &lt;strong&gt;connect a mobile device&lt;/strong&gt; for remote access&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Features Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Persistent Loop Agents (&lt;code&gt;/goal&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;The most powerful feature is &lt;code&gt;/goal&lt;/code&gt; — a persistent, long-horizon agent that keeps executing until a task is fully resolved. Unlike one-shot completions, this mode maintains state across tool calls and file edits, making it suitable for multi-step engineering tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/goal Build a responsive landing page with animated hero section, 
feature grid, and contact form. Use Tailwind CSS.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent will plan, generate, self-review, and iterate until the loop terminates with a completed artifact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Management and Observability
&lt;/h3&gt;

&lt;p&gt;The top-right panel surfaces four critical views during agent execution:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Side Conversation&lt;/strong&gt; — a temporary chat thread to ask clarifying questions without interrupting the main task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread To-Do List&lt;/strong&gt; — live task checklist for long-horizon operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Change Log&lt;/strong&gt; — real-time diff viewer showing every file edit as it happens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Artifacts&lt;/strong&gt; — live preview panel rendering generated frontend output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This observability stack is what separates DeepSeek GUI from raw API calls — you can watch the model reason, modify, and complete work in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Effort Control
&lt;/h3&gt;

&lt;p&gt;DeepSeek's R1-series models support configurable reasoning depth. The UI exposes this as a slider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;default&lt;/code&gt; — fast, low-cost responses&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;high&lt;/code&gt; — balanced reasoning for moderate complexity&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ultra&lt;/code&gt; — maximum chain-of-thought depth for complex tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Setting reasoning to &lt;code&gt;ultra&lt;/code&gt; for frontend generation tasks produced measurably better output in testing — more cohesive typography, proper component structure, and cleaner CSS.&lt;/p&gt;

&lt;h3&gt;
  
  
  MCP Plugin Integration
&lt;/h3&gt;

&lt;p&gt;The settings panel allows you to attach external MCP-compatible tools to the agent, effectively extending what it can do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web search&lt;/li&gt;
&lt;li&gt;Database connectors&lt;/li&gt;
&lt;li&gt;Custom code execution environments&lt;/li&gt;
&lt;li&gt;External API integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mirrors the capability model of enterprise agent platforms, but running locally on your own hardware.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Demo: Generating a Frontend in Under a Cent
&lt;/h2&gt;

&lt;p&gt;To benchmark the model, a prompt was used to generate a full editorial stats landing page — complete with dynamic typography, animated sections, and a structured layout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost breakdown:&lt;/strong&gt; The complete task consumed less than $0.01 in API credits.&lt;/p&gt;

&lt;p&gt;That cost profile changes the economics of AI-assisted development entirely. Tasks that would cost $0.50–$2.00 with GPT-4o or Claude 3.5 Sonnet run for fractions of a cent here, with competitive output quality when reasoning is set to &lt;code&gt;ultra&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Python Integration Example
&lt;/h2&gt;

&lt;p&gt;For developers who want to integrate DeepSeek's API into their own pipelines, here is a production-ready example. This code uses &lt;strong&gt;&lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;xuedingmao.com&lt;/a&gt;&lt;/strong&gt; as the API gateway — a developer platform aggregating 500+ models including GPT-5.5, Gemini 3.1 Pro, and Claude models, with a unified OpenAI-compatible interface.&lt;/p&gt;

&lt;p&gt;The example below uses &lt;strong&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/strong&gt; — one of the most capable models currently available on the platform, offering exceptional reasoning depth, long-context understanding (200K tokens), and strong code generation performance. It's a solid default for agentic and complex multi-step tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
DeepSeek / Multi-Model Agent Integration Example
Platform: xuedingmao.com (OpenAI-compatible API gateway)
Default model: claude-opus-4-8 (200K context, strong reasoning)

Usage:
    pip install openai
    Set XUEDINGMAO_API_KEY as environment variable or replace inline.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;

&lt;span class="c1"&gt;# ─────────────────────────────────────────────
# Configuration
# ─────────────────────────────────────────────
&lt;/span&gt;&lt;span class="n"&gt;API_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key-here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# claude-opus-4-8: Anthropic's flagship model with 200K context window.
# Excels at multi-step reasoning, code generation, and structured output tasks.
# Ideal for agentic workflows that require deep contextual understanding.
&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# ─────────────────────────────────────────────
# Core agent function
# ─────────────────────────────────────────────
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_coding_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Lower temperature for deterministic code output
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Execute a coding task via the AI agent API.

    Args:
        task_description: Natural language description of the task.
        system_prompt: Optional system-level instructions for the model.
        model: Model identifier. Defaults to claude-opus-4-8.
        max_tokens: Maximum response token budget.
        temperature: Sampling temperature. Lower = more deterministic.

    Returns:
        dict with keys: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;finish_reason&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior software engineer. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write clean, well-commented, production-ready code. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Always include error handling and type annotations where applicable.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completion_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;finish_reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API call failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;


&lt;span class="c1"&gt;# ─────────────────────────────────────────────
# Multi-turn conversation (agentic loop scaffold)
# ─────────────────────────────────────────────
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_multi_turn_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;initial_task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;follow_ups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Simulate a persistent agent loop with follow-up instructions.
    Maintains conversation history across turns.

    Args:
        initial_task: The primary task prompt.
        follow_ups: List of follow-up instructions to apply iteratively.
        model: Model to use for all turns.

    Returns:
        List of turn results, each containing content and usage stats.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;conversation_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding agent. Complete tasks incrementally. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;On each follow-up, refine or extend your previous output.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial_task&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;initial_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;follow_ups&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn_index&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Append follow-up as a new user message
&lt;/span&gt;            &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;assistant_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="c1"&gt;# Append model response to maintain history
&lt;/span&gt;        &lt;span class="n"&gt;conversation_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;turn_index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tokens_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Turn &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn_index&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Tokens used: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;


&lt;span class="c1"&gt;# ─────────────────────────────────────────────
# Example usage
# ─────────────────────────────────────────────
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Single-turn: generate a landing page component
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Single-Turn Code Generation ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_coding_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a responsive Hero section component in React + Tailwind CSS. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Include an animated headline, subtext, CTA button, and a background gradient. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use TypeScript with proper prop types.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens used: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;usage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_tokens&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finish reason: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;finish_reason&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;--- Generated Output ---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Multi-turn: iterative refinement loop
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Multi-Turn Agentic Loop ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_multi_turn_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;initial_task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python FastAPI endpoint for user registration with email validation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;follow_ups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add password hashing using bcrypt and return a JWT on success.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add rate limiting (5 requests per minute per IP) using slowapi.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Turn &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;turn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tokens_used&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The platform at &lt;a href="https://xuedingmao.com" rel="noopener noreferrer"&gt;xuedingmao.com&lt;/a&gt; provides real-time access to newly released models as they ship, which matters when you're benchmarking or need to quickly evaluate a new release without migrating infrastructure. The unified interface means you can swap &lt;code&gt;DEFAULT_MODEL&lt;/code&gt; to &lt;code&gt;deepseek-r1&lt;/code&gt;, &lt;code&gt;gpt-5.5&lt;/code&gt;, or &lt;code&gt;gemini-3.1-pro&lt;/code&gt; with zero other code changes — useful for running the kind of comparative benchmarks shown in the video.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caveats and Data Policy Considerations
&lt;/h2&gt;

&lt;p&gt;One point worth stating clearly for any production or commercial use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek's API data policy includes training on API usage data.&lt;/strong&gt; This is not unique to DeepSeek — several major providers do the same — but it's worth auditing before sending proprietary code, internal business logic, or PII through the API. For sensitive workloads, use a model provider whose data policy explicitly excludes training on API inputs.&lt;/p&gt;

&lt;p&gt;For personal projects, open-source work, or non-sensitive prototyping, the cost/capability tradeoff is genuinely compelling.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;DeepSeek GUI fills a real gap: a &lt;strong&gt;free, open-source, locally running agent platform&lt;/strong&gt; that delivers Codex-level UX without proprietary lock-in. Its persistent agent loops, live diff viewer, MCP extensibility, and sub-cent task costs make it worth evaluating for any developer who's felt priced out of the premium agent platforms.&lt;/p&gt;

&lt;p&gt;The core insight from testing is straightforward: &lt;strong&gt;set reasoning to &lt;code&gt;ultra&lt;/code&gt; for complex generation tasks&lt;/strong&gt;. The quality gap between default and ultra reasoning is noticeable on anything more complex than simple CRUD.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;#AI #LLM #Python #OpenSource #DevTools #AgentFramework #DeepSeek #TechnicalWalkthrough&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>From Code Completion to Autonomous Reasoning: What the Oceanus Leak Tells Us About the Future of AI Software Engineering</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Sun, 07 Jun 2026 15:20:46 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/from-code-completion-to-autonomous-reasoning-what-the-oceanus-leak-tells-us-about-the-future-of-ai-57ng</link>
      <guid>https://dev.to/schrodingcatai/from-code-completion-to-autonomous-reasoning-what-the-oceanus-leak-tells-us-about-the-future-of-ai-57ng</guid>
      <description>&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Drawing from the Oceanus model leak incident, this article dissects how frontier large language models are evolving in code reasoning, vulnerability discovery, tree-search inference, MoE architecture, and automated engineering loops—with a production-ready Python AI code review API implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Background: What a Leak Reveals About Frontier Model Capabilities
&lt;/h2&gt;

&lt;p&gt;Recent leaks surrounding Anthropic's internal model Oceanus have sparked debate in the AI community about whether frontier models have crossed the threshold for autonomous security research. Based on leaked transcripts, Oceanus is described as a checkpoint in the Claude Mythos model family, positioned above the standard Opus series, and appears to have been in pre-release red-teaming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat&lt;/strong&gt;: Model names, parameter counts, pricing, vulnerability counts, and release dates mentioned in the source material have not been officially confirmed. This article treats it as a technical case study rather than verified news, using it to analyze the key directions frontier models are approaching:&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Shifts
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;LLMs are evolving from "code completion tools" to "code reasoning systems"&lt;/li&gt;
&lt;li&gt;Security capabilities are shifting from passive audit assistance toward proactive defect discovery&lt;/li&gt;
&lt;li&gt;Model reasoning is incorporating search, backtracking, and self-evaluation mechanisms&lt;/li&gt;
&lt;li&gt;Engineering execution is moving from single-turn Q&amp;amp;A to sandboxed end-to-end闭环&lt;/li&gt;
&lt;li&gt;Model access control, security governance, and red-teaming are becoming more critical&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These shifts reveal that the ceiling of AI coding capability no longer depends solely on "generating correct code snippets"—it depends on whether models can continuously understand projects, plan paths, execute tests, analyze errors, and iterate toward fixes.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Core Principles: Why Frontier Models Significantly Improve Code Reasoning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 The Fundamental Shift in Code Reasoning
&lt;/h3&gt;

&lt;p&gt;Traditional code models rely primarily on context-aware completion, generating similar implementations based on existing code patterns. Stronger models now demonstrate cross-file understanding, call-chain analysis, exception-path reasoning, and test-feedback utilization.&lt;/p&gt;

&lt;p&gt;In real software engineering, a complex problem typically involves:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem Decomposition&lt;/strong&gt; — The model first identifies the goal, such as fixing a bug, refactoring a module, adding tests, or locating performance bottlenecks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependency Understanding&lt;/strong&gt; — The model analyzes function call relationships, state changes, boundary conditions, and third-party library behaviors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution Generation&lt;/strong&gt; — The model does not just generate a single answer but compares multiple fix paths, selecting lower-risk options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification Loop&lt;/strong&gt; — By running tests, reading error outputs, and rewriting code, the final output gradually converges.&lt;/p&gt;

&lt;p&gt;The capability described—"pulling code, installing dependencies, running tests, reading errors, and rewriting its own output"—is the hallmark of &lt;strong&gt;Agentic Coding&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.2 Tree-Search Reasoning: From Single Generation to Multi-Path Exploration
&lt;/h3&gt;

&lt;p&gt;The leaked transcripts mention that Oceanus may employ an AlphaGo-style tree-search mechanism. While unverified, this is technically well-grounded.&lt;/p&gt;

&lt;p&gt;Standard LLM inference is linear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt -&amp;gt; Token 1 -&amp;gt; Token 2 -&amp;gt; Token 3 -&amp;gt; Final Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search-augmented reasoning is more like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Problem Input -&amp;gt; Generate Multiple Candidates -&amp;gt; Score Each -&amp;gt; Prune Low-Quality Paths
-&amp;gt; Explore High-Value Paths -&amp;gt; Backtrack When Needed -&amp;gt; Output Final Result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mechanism is especially effective in code tasks. When facing a complex bug, a model might simultaneously explore:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Locating from the exception stack trace&lt;/li&gt;
&lt;li&gt;Locating from recent commit diffs&lt;/li&gt;
&lt;li&gt;Locating from test assertion failures&lt;/li&gt;
&lt;li&gt;Locating from data structure invariants&lt;/li&gt;
&lt;li&gt;Locating from concurrency timing issues&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If one path fails to explain the phenomenon, the model can backtrack and choose an alternative reasoning path. The trade-off is significantly higher inference cost—every visible output token may represent a large number of hidden search tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 MoE Architecture: Partial Experts Handling Different Complexity Levels
&lt;/h3&gt;

&lt;p&gt;The transcripts also reference a possible Mixture of Experts architecture. The core idea of MoE: the total parameter count is large, but each inference activates only a subset of expert networks, achieving a balance between capability and cost.&lt;/p&gt;

&lt;p&gt;Simple conversations require only a few experts. Complex code repository analysis, large refactors, or vulnerability investigations activate stronger expert modules through the router.&lt;/p&gt;

&lt;p&gt;A typical MoE inference flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input Token -&amp;gt; Router Classifies Task Type -&amp;gt; Selects Top-K Experts
-&amp;gt; Expert Networks Process in Parallel -&amp;gt; Outputs Merged
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This explains why frontier models can handle complex engineering tasks while maintaining high throughput. However, MoE systems place extreme demands on routing strategy, expert load balancing, cache hit rates, and distributed inference frameworks.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Hands-On Demo: Building an AI Code Security Review Script
&lt;/h2&gt;

&lt;p&gt;The following Python implementation is a runnable code security review script. It uses an OpenAI-compatible interface to call a large language model for security risk analysis, vulnerability classification, and remediation guidance.&lt;/p&gt;

&lt;p&gt;This example uses XueDingMao AI (&lt;code&gt;https://xuedingmao.com&lt;/code&gt;), a unified model gateway I commonly use in AI development. It operates in OpenAI-compatible mode—configure the Base URL, API Key, and model name once to access 500+ mainstream LLMs including GPT-5.4, Claude 4.6, and Gemini 3.1 Pro.&lt;/p&gt;

&lt;p&gt;All examples default to &lt;code&gt;claude-opus-4-6&lt;/code&gt;, which excels at complex code understanding, long-context reasoning, architecture analysis, and multi-step task planning—making it well-suited for code review, security analysis, refactoring suggestions, and technical documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.1 Install Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.2 Configure Environment Variables
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 Complete Python Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_source_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load the source code file for review.
    Applies a basic size limit to avoid submitting oversized contexts.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File not found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_file&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Not a file: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;max_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;  &lt;span class="c1"&gt;# 200 KB limit
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stat&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;st_size&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File too large. Please split it first. Current limit: 200KB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_security_review_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Construct the prompt for code security review.
    Emphasis on verifiable, executable, low-false-positive findings.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a senior application security engineer. Please conduct a
security review of the following code.

Review objectives:
1. Identify potential security risks such as injection, privilege escalation,
   path traversal, deserialization, command execution, and sensitive data leakage.
2. Classify severity: Critical / High / Medium / Low.
3. Provide trigger conditions, impact scope, and remediation guidance.
4. If no significant issues are found, state clearly and suggest hardening measures.
5. Do not fabricate non-existent context. Analyze only based on the code provided.

Filename: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Code:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Please use the following structure in your output:

## Review Summary
## Risk Inventory
## Remediation Guidance
## Hardening Recommendations
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create an OpenAI-compatible client.
    XueDingMao AI provides a unified Base URL for switching between models.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;EnvironmentError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please set XUEDINGMAO_API_KEY in .env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Invoke the LLM to perform code security review.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_source_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_security_review_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a rigorous, restrained, and professional &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                           &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code security reviewer. All output must be evidence-based.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No valid content returned from model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Usage: python ai_security_review.py &amp;lt;file_path&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execution failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.4 Running the Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python ai_security_review.py ./example.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In real teams, this script can be integrated into GitHub Actions, GitLab CI, or internal code platforms to auto-generate security review reports during the Pull Request stage.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Technical Resources and Tool Selection
&lt;/h2&gt;

&lt;p&gt;In multi-model development scenarios, the model integration approach directly impacts engineering efficiency. Maintaining separate SDKs, authentication methods, and model parameters for each vendor adds significant overhead. A unified OpenAI-compatible interface is the more engineering-friendly approach.&lt;/p&gt;

&lt;p&gt;For AI application prototyping, model evaluation, and coding agent experiments, XueDingMao AI (&lt;code&gt;xuedingmao.com&lt;/code&gt;) serves as my unified integration layer. Its key technical values:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified Multi-Model Access&lt;/strong&gt; — Aggregates 500+ mainstream LLMs. Application-side code only needs to maintain one calling pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fast New Model Onboarding&lt;/strong&gt; — Teams needing frontier model capabilities can compare performance differences across models in code reasoning, long-text analysis, and multimodal understanding immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduced Integration Complexity&lt;/strong&gt; — A single Base URL, API Key, and model name enables model routing, A/B testing, fallback strategies, and cost control systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Caveats: Governance Is Non-Negotiable When Using Frontier Models in Security
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 Do Not Generate Offensive Exploit Code
&lt;/h3&gt;

&lt;p&gt;Code security reviews should focus on risk identification, impact analysis, and remediation guidance. Content involving exploit chains, weaponization scripts, or bypass mechanisms should be scope-limited—avoid turning the model into an attack automation tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 Human Review Is Mandatory
&lt;/h3&gt;

&lt;p&gt;Even models with strong code reasoning capabilities can produce false positives, false negatives, or context misunderstandings. Security conclusions in production must be reviewed by engineers, especially for high-severity vulnerabilities, permission models, and business risk logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Establish Access Control and Audit Trails
&lt;/h3&gt;

&lt;p&gt;The most significant takeaway from the Oceanus incident is not a single model's capability—it's the governance question: &lt;strong&gt;who can access powerful models?&lt;/strong&gt; When integrating high-capability models, enterprises should implement API key tiers, call logging, sensitive task approval, and anomaly alerting.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Sanitize Sensitive Information in Contexts
&lt;/h3&gt;

&lt;p&gt;Code reviews frequently involve keys, internal addresses, business rules, and customer data. Sanitize before submitting to the model, define clear data boundaries, and prevent sensitive assets from being exposed to uncontrolled chains.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Conclusion
&lt;/h2&gt;

&lt;p&gt;Whatever the final details of the Oceanus leak, the direction is clear: frontier LLMs are evolving from "question-answering tools" to "systems capable of executing complex engineering tasks." Tree search, self-evaluation, MoE, sandboxed execution, and closed-loop repair will continue raising the ceiling of AI capability in software engineering and security analysis.&lt;/p&gt;

&lt;p&gt;But stronger capability demands stronger governance. The true differentiator ahead will not just be training more powerful models—it will be running them within auditable, controllable, and verifiable boundaries in real engineering scenarios.&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #LLM #Python #MachineLearning #TechTutorial #Security #CodeReview #MoE #AgenticAI #Claude #GPT #Gemini
&lt;/h1&gt;




</description>
      <category>ai</category>
      <category>llm</category>
      <category>security</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>AI_Memory_Systems_Complete_Guide</title>
      <dc:creator>SchrodingCatAI</dc:creator>
      <pubDate>Sun, 07 Jun 2026 13:21:34 +0000</pubDate>
      <link>https://dev.to/schrodingcatai/aimemorysystemscompleteguide-30l2</link>
      <guid>https://dev.to/schrodingcatai/aimemorysystemscompleteguide-30l2</guid>
      <description>&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;AI memory systems are reshaping the landscape of LLM applications, evolving from one-off Q&amp;amp;A sessions into intelligent assistants that continuously understand user context. This article examines the memory mechanisms behind ChatGPT, Claude, Gemini, and Copilot, breaking down explicit memories, implicit inference, memory summarization, and privacy risks—complete with a production-ready Python implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Background: Why LLMs Are Starting to "Remember You"
&lt;/h2&gt;

&lt;p&gt;Traditional LLM applications are stateless: a user submits a request, the model generates a response based on the current prompt and context window, and the session ends there. While this works for general Q&amp;amp;A, it falls short in long-term tasks, personal assistance, and enterprise knowledge collaboration.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want the AI to remember your code style preferences over time.&lt;/li&gt;
&lt;li&gt;You need the AI to understand your project's background, tech stack, and delivery timeline.&lt;/li&gt;
&lt;li&gt;You want the AI to track evolving requirements across multiple conversations.&lt;/li&gt;
&lt;li&gt;Enterprise users need the AI to grasp organizational documents, meeting notes, and team member roles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is driving the shift from &lt;strong&gt;Stateless Tool&lt;/strong&gt; to &lt;strong&gt;Stateful Assistant&lt;/strong&gt;. Products like ChatGPT, Claude, Gemini, and Microsoft Copilot are all converging on the same goal: building controllable, updatable, and auditable long-term memory systems.&lt;/p&gt;

&lt;p&gt;It's important to clarify that "memory" does not mean real-time modification of model parameters. Most AI memory systems dynamically inject user profiles, historical facts, preferences, and task states into the context window before inference—or use retrieval-augmented generation (RAG) to recall relevant memories.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Principles: The Four-Layer Architecture of AI Memory Systems
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1: Explicit Memory — Facts the User Declares
&lt;/h3&gt;

&lt;p&gt;Explicit memory is the most straightforward type. The user explicitly tells the AI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Please remember that I use Python and FastAPI for backend development.&lt;br&gt;
Please remember that I prefer Markdown tables for summarizing information.&lt;br&gt;
Please remember that my project deadline is June 10th.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This information typically enters long-term storage, is tagged as a stable fact, and participates in prompt construction across future sessions.&lt;/p&gt;

&lt;p&gt;Engineers typically structure explicit memories with these fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;user_id&lt;/code&gt;: User identifier&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_type&lt;/code&gt;: Memory category (preference, project, identity, constraint)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;content&lt;/code&gt;: Memory content&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;created_at / updated_at&lt;/code&gt;: Timestamps&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;confidence&lt;/code&gt;: Reliability score&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;status&lt;/code&gt;: Active, hidden, deleted, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Implicit Memory — Inferred from Conversation History
&lt;/h3&gt;

&lt;p&gt;ChatGPT's new "Dream Architecture" or "Implicit Memory Layer" goes beyond what users explicitly request. The system automatically extracts context from chat history, uploaded files, and connected apps.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A user repeatedly asks about camera equipment → the system infers an interest in photography.&lt;/li&gt;
&lt;li&gt;A user consistently requests "concise, formal, bullet-point output" → the system infers a communication preference.&lt;/li&gt;
&lt;li&gt;A user discusses a specific SaaS project across sessions → the system infers their current work context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implicit memory significantly improves user experience, but introduces risk: the model might incorrectly infer identity, interests, or intent—and amplify these errors over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Memory Summarization — Compression and Governance
&lt;/h3&gt;

&lt;p&gt;Memory summarization is critical in modern AI systems. Historical conversations can be extremely long and cannot all fit into a model's context window. The system must compress extensive interactions into structured summaries.&lt;/p&gt;

&lt;p&gt;A well-formed memory summary might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"preferences"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"English"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"output_style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"technical, structured, concise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code_language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Python"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"projects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI Agent Engineering Platform"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stack"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"FastAPI"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PostgreSQL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Redis"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LLM API"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"constraints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Avoid overly colloquial language"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Code examples must be runnable"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Memory summarization delivers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduced context token costs&lt;/li&gt;
&lt;li&gt;Improved conversation continuity over time&lt;/li&gt;
&lt;li&gt;Support for user auditing and modification&lt;/li&gt;
&lt;li&gt;Prevention of stale information stacking on top of new data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "marathon training" and "ankle injury" example from the video is fundamentally a memory conflict resolution problem: the system cannot mechanically store both facts—it must understand state changes and update the user profile accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Memory Recall — Using the Right Information at the Right Time
&lt;/h3&gt;

&lt;p&gt;Not every memory should enter every request. An effective memory system must determine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this question require user preferences?&lt;/li&gt;
&lt;li&gt;Is the current task related to a known project?&lt;/li&gt;
&lt;li&gt;Has this memory expired?&lt;/li&gt;
&lt;li&gt;Does it contain privacy-sensitive information?&lt;/li&gt;
&lt;li&gt;Does it conflict with new information?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common engineering approaches include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keyword and embedding-based similarity retrieval&lt;/li&gt;
&lt;li&gt;Time-decay weighted relevance scoring&lt;/li&gt;
&lt;li&gt;Memory type-based rule filtering&lt;/li&gt;
&lt;li&gt;LLM-powered secondary reranking of candidate memories&lt;/li&gt;
&lt;li&gt;Desensitization or complete exclusion of sensitive data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Tool Selection: Multi-Model Integration and Memory Experimentation
&lt;/h2&gt;

&lt;p&gt;Single models often lack flexibility in real-world AI memory development. Different models vary in long-context capability, reasoning, tool calling, multilingual understanding, and code generation. My daily AI development environment uses XueDingMao AI (xuedingmao.com) as a unified model gateway.&lt;/p&gt;

&lt;p&gt;Its key technical advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregates 500+ mainstream LLMs, including GPT-5.4, Claude 4.6, Gemini 3.1 Pro, and more.&lt;/li&gt;
&lt;li&gt;New models are published in real-time, enabling developers to verify frontier API capabilities immediately.&lt;/li&gt;
&lt;li&gt;Uses OpenAI-compatible mode with a unified Base URL, API Key, and model name.&lt;/li&gt;
&lt;li&gt;Reduces complexity across multi-model switching, multi-vendor authentication, and interface adaptation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All code examples in this article default to &lt;code&gt;claude-opus-4-6&lt;/code&gt;. This model excels at complex reasoning, long-text understanding, code generation, and technical writing—making it ideal as a summarization engine, conflict analyzer, and context reranker in memory systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hands-On Demo: Building a Lightweight AI Memory Layer in Python
&lt;/h2&gt;

&lt;p&gt;Below is a simplified memory system with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Saving user explicit memories.&lt;/li&gt;
&lt;li&gt;Extracting implicit memories from conversations.&lt;/li&gt;
&lt;li&gt;Generating structured memory summaries.&lt;/li&gt;
&lt;li&gt;Injecting relevant memories into the next request.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Install Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Variables
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;XUEDINGMAO_API_KEY=your_api_key_here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Complete Python Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A lightweight local memory store. Replace with PostgreSQL,
    MongoDB, or a vector database in production.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ai_memory.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;row_factory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Row&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_table&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            CREATE TABLE IF NOT EXISTS memories (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                user_id TEXT NOT NULL,
                memory_type TEXT NOT NULL,
                content TEXT NOT NULL,
                confidence REAL DEFAULT 0.8,
                status TEXT DEFAULT &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
                created_at TEXT NOT NULL,
                updated_at TEXT NOT NULL
            )
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memory_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            INSERT INTO memories
            (user_id, memory_type, content, confidence, status, created_at, updated_at)
            VALUES (?, ?, ?, ?, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, ?, ?)
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_active_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            SELECT id, memory_type, content, confidence, created_at, updated_at
            FROM memories
            WHERE user_id = ? AND status = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
            ORDER BY updated_at DESC
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,)).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
            UPDATE memories
            SET status = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deleted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, updated_at = ?
            WHERE id = ?
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LLMClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Uses XueDingMao AI&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s OpenAI-compatible interface.
    Base URL: https://xuedingmao.com
    Default model: claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XUEDINGMAO_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please set XUEDINGMAO_API_KEY in .env&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://xuedingmao.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryAgent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;An AI Agent with simplified memory capabilities.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_store&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LLMClient&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_store&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_implicit_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract potentially long-term valuable implicit memories from user input.
        Note: Production systems should include sensitive data detection
        and user confirmation mechanisms.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are an AI memory extractor. From the following user conversation,
extract memories that have long-term value.

Requirements:
1. Only extract stable, reusable information.
2. Do NOT extract sensitive information like ID numbers, bank cards, or health data.
3. Output a JSON array.
4. Each element must include: memory_type, content, confidence.
5. If nothing worth saving, output an empty array [].

Available memory_type values:
- preference: user preference
- project: project background
- skill: skills or tech stack
- constraint: long-term constraint
- interest: area of interest

User conversation:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You excel at extracting structured long-term memories from conversations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Model output is not valid JSON, skipping memory write:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;memory_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_memory_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Compress the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s long-term memories into a summary
        for injection into system prompts.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_active_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No long-term memories available.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Please organize the following user memories into a concise,
structured context summary.

Requirements:
1. Keep information that helps answer future questions.
2. Merge duplicate content.
3. Flag conflicts that need user confirmation.
4. Output in English.

User memories:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a rigorous AI memory summarization engine.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;answer_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Inject memory summary before answering for personalized context augmentation.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="n"&gt;memory_summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build_memory_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a professional AI technical assistant.
Use the following long-term context summary to inform your answers,
but avoid over-exposing personal information.

User long-term memory summary:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_summary&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Usage guidelines:
- Only use memories relevant to the current question.
- Do not proactively mention irrelevant personal details.
- Flag potentially outdated or conflicting memories for user confirmation.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csdn_user_001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryStore&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLMClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Simulate a user conversation for implicit memory extraction
&lt;/span&gt;    &lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve been working on an AI Agent platform lately, using Python, FastAPI, and PostgreSQL for the backend.
    I prefer answers that are professional rather than colloquial, and ideally include runnable code.
    I may integrate multiple LLM APIs in the future, so interface compatibility is important.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_implicit_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please design a multi-model access layer architecture for an AI Agent.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;answer_with_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI Response:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Caveats: More Memory Is Not Always Better
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Privacy Boundaries Must Be Explicit
&lt;/h3&gt;

&lt;p&gt;As highlighted in the original video: health information, financial details, and identity data from regular conversations can all be written into memory. Developers building AI applications should implement sensitive information detection, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Desensitizing phone numbers, emails, and ID numbers.&lt;/li&gt;
&lt;li&gt;Defaulting medical, financial, and legal content to non-storage.&lt;/li&gt;
&lt;li&gt;Requiring user confirmation for high-risk memories.&lt;/li&gt;
&lt;li&gt;Supporting user view, edit, hide, and delete operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Preventing Incorrect Inferences from Persisting
&lt;/h3&gt;

&lt;p&gt;The biggest risk of implicit memory is incorrect inference. For example, if a user is just helping a friend look something up, the system might incorrectly conclude this is a long-term personal interest. Mitigation strategies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assigning &lt;code&gt;confidence&lt;/code&gt; scores to all memories.&lt;/li&gt;
&lt;li&gt;Excluding low-confidence memories from direct prompt injection.&lt;/li&gt;
&lt;li&gt;Adding expiration dates to memories.&lt;/li&gt;
&lt;li&gt;Providing a memory audit interface.&lt;/li&gt;
&lt;li&gt;Triggering user confirmation for conflicting information.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Preventing Hallucinations from Becoming Structurally Fixed
&lt;/h3&gt;

&lt;p&gt;A regular hallucination only affects one answer. But if a hallucination gets written to long-term memory, it becomes a structural error. Developers should avoid letting the model write to the database without constraints. A safer approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;LLM generates candidate memories.&lt;/li&gt;
&lt;li&gt;Rule-based system filters sensitive content.&lt;/li&gt;
&lt;li&gt;User confirms or system performs secondary validation.&lt;/li&gt;
&lt;li&gt;Final write to storage.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Personalization Should Not Become Intrusion
&lt;/h3&gt;

&lt;p&gt;Remembering user preferences has value, but proactively mentioning personal details in every response creates discomfort. A mature memory system should follow a "relevant when needed" principle—mechanically injecting all memories into every context defeats the purpose.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AI memory systems are becoming core infrastructure for LLM applications. ChatGPT's unified memory pool, Claude's specialized context handling, Gemini's ecosystem integration, and Copilot's enterprise compliance features are all pushing AI from "answering questions" toward "understanding long-term context."&lt;/p&gt;

&lt;p&gt;For developers, the real challenge is not simply copying a product feature—it's understanding the engineering fundamentals of memory systems: explicit storage, implicit extraction, summarization compression, conflict resolution, privacy governance, and context recall. Only when AI memory is controllable, auditable, and deletable does it become a capability that enhances efficiency rather than a new source of risk.&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #LLM #Python #MachineLearning #TechTutorial #MemorySystems #RAG #ChatGPT #Claude #Gemini #Copilot
&lt;/h1&gt;




</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
