<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lavelle Hatcher Jr</title>
    <description>The latest articles on DEV Community by Lavelle Hatcher Jr (@lavellehatcherjr).</description>
    <link>https://dev.to/lavellehatcherjr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3860079%2F668ec800-7cf1-492e-baf0-0ce71e77a1ef.png</url>
      <title>DEV Community: Lavelle Hatcher Jr</title>
      <link>https://dev.to/lavellehatcherjr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lavellehatcherjr"/>
    <language>en</language>
    <item>
      <title>How I Stopped Getting "Stream Idle Timeout" Errors in Claude Code</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Sat, 25 Apr 2026 11:59:59 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/how-i-stopped-getting-stream-idle-timeout-errors-in-claude-code-hf9</link>
      <guid>https://dev.to/lavellehatcherjr/how-i-stopped-getting-stream-idle-timeout-errors-in-claude-code-hf9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdk5mdun5cga2fvym0wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdk5mdun5cga2fvym0wj.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix is five lines in your CLAUDE.md, not a settings change
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Note: This is a personal workaround based on my own experience. Your mileage may vary.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you use Claude Code for anything longer than a short conversation, you have probably seen this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API Error: Stream idle timeout - partial response received
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It cuts off mid-response. Your work disappears. The retry often fails the same way. There is no recovery button. As of April 2026, it is one of the most reported bugs on the Claude Code GitHub repo, with multiple open issues going back months.&lt;/p&gt;

&lt;p&gt;The issue has been more common since the launch of &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;. Several GitHub issues filed since mid-April specifically name Opus 4.7 and the 1M context variant as triggers, and recent Claude Code changelogs show stream-handling improvements shipping. The bug also shows up in regular Claude chat sessions during long outputs, but Claude Code is where it hits hardest because of the heavy tool-call chains.&lt;/p&gt;

&lt;p&gt;I hit it repeatedly while using Claude Code for multi-file projects. After losing work three or four times in a row, I started experimenting with prompt-level instructions that prevent the timeout from firing in the first place. The trick is not to fix the timeout. The trick is to never trigger it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;The timeout fires when Claude Code's streaming connection goes idle for too long during a single response. Long outputs are the trigger. If Claude tries to write a 300-line file in one tool call, or runs a grep that dumps hundreds of lines, or chains multiple heavy tool calls without pausing, the stream stalls and the connection drops.&lt;/p&gt;

&lt;p&gt;The bug is worse in longer sessions. After 20 or more tool calls in a single conversation, the probability of hitting it goes up noticeably.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fix: add these instructions to your CLAUDE.md
&lt;/h2&gt;

&lt;p&gt;Create or open a &lt;code&gt;CLAUDE.md&lt;/code&gt; file in your project root. Add this block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Stream Timeout Prevention&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Do each numbered task ONE AT A TIME. Complete one task fully,
   confirm it worked, then move to the next.
&lt;span class="p"&gt;2.&lt;/span&gt; Never write a file longer than ~150 lines in a single tool call.
   If a file will be longer, write it in multiple append/edit passes.
&lt;span class="p"&gt;3.&lt;/span&gt; Start a fresh session if the conversation gets long (20+ tool calls).
   The error gets worse as the session grows.
&lt;span class="p"&gt;4.&lt;/span&gt; Keep individual grep/search outputs short. Use flags like
   &lt;span class="sb"&gt;`--include`&lt;/span&gt; and &lt;span class="sb"&gt;`-l`&lt;/span&gt; (list files only) to limit output size.
&lt;span class="p"&gt;5.&lt;/span&gt; If you do hit the timeout, retry the same step in a shorter form.
   Don't repeat the entire task from scratch.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is it. Claude Code reads &lt;code&gt;CLAUDE.md&lt;/code&gt; at the start of every session and follows the instructions as constraints. These five rules keep each streaming chunk small enough that the idle timeout never fires.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this works
&lt;/h2&gt;

&lt;p&gt;Each rule targets a specific trigger:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 1&lt;/strong&gt; prevents Claude from batching multiple tasks into one giant response. Instead of "create three files, run tests, and fix the errors" in a single output, it does one step, confirms, then moves on. Smaller outputs, no stall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 2&lt;/strong&gt; is the most important one. A 300-line file write is the single most common trigger for the timeout. Splitting it into two 150-line passes keeps each chunk under the threshold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 3&lt;/strong&gt; addresses session degradation. I have not seen Anthropic document this publicly, but in my experience the timeout becomes almost guaranteed after about 20 tool calls in a single session. Starting fresh resets whatever internal state is accumulating.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 4&lt;/strong&gt; catches the other common trigger: unbounded search output. A recursive grep that returns 500 lines of matches will stall the stream just as badly as a long file write.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rule 5&lt;/strong&gt; saves you from the retry death spiral. When you hit the timeout and retry the exact same prompt, you get the exact same stall. Retrying with a shorter version of the same step usually works on the first try.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I tried that did not work
&lt;/h2&gt;

&lt;p&gt;Before landing on the CLAUDE.md approach, I tried several other things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Increasing &lt;code&gt;CLAUDE_STREAM_IDLE_TIMEOUT_MS&lt;/code&gt;&lt;/strong&gt;: This is a terminal CLI environment variable and does not always resolve the issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switching browsers&lt;/strong&gt;: Same behavior in Chrome, Firefox, and Safari.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switching models&lt;/strong&gt;: Happens on both Opus and Sonnet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shorter prompts&lt;/strong&gt;: The prompt length is not the issue. The output length is.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The CLAUDE.md approach works because it constrains the output at the source. Claude follows the instructions before it starts generating, so the stream never gets long enough to stall.&lt;/p&gt;




&lt;h2&gt;
  
  
  Worth noting
&lt;/h2&gt;

&lt;p&gt;Recent Claude Code changelogs show stream-handling improvements shipping regularly. This CLAUDE.md workaround is a bridge for the meantime, not a permanent solution. Once the platform-level fix ships, you can remove the block.&lt;/p&gt;

&lt;p&gt;If you are using Claude Code for real work today, adding these five lines saves a lot of frustration while improvements are in progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/49619" rel="noopener noreferrer"&gt;GitHub Issue #49619: Stream idle timeout during long tool-use turns on Opus 4.7&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/47841" rel="noopener noreferrer"&gt;GitHub Issue #47841: Stream idle timeout on Claude Code Web&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/47252" rel="noopener noreferrer"&gt;GitHub Issue #47252: Ultraplan repeated stream idle timeout errors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code/issues/46987" rel="noopener noreferrer"&gt;GitHub Issue #46987: Stream idle timeout - multiple times today&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Serving Qwen3.6-35B-A3B With vLLM and Building a Coding Agent With Tool Calling</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Sun, 19 Apr 2026 05:42:43 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/serving-qwen36-35b-a3b-with-vllm-and-building-a-coding-agent-with-tool-calling-2kob</link>
      <guid>https://dev.to/lavellehatcherjr/serving-qwen36-35b-a3b-with-vllm-and-building-a-coding-agent-with-tool-calling-2kob</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyhcdu0cs99qsbc81uuc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuyhcdu0cs99qsbc81uuc.png" alt=" " width="800" height="419"&gt;&lt;/a&gt;&lt;br&gt;
Alibaba's Qwen team released &lt;strong&gt;Qwen3.6-35B-A3B&lt;/strong&gt; on April 16, 2026 under Apache 2.0. It is a sparse mixture-of-experts model with 35 billion total parameters but only about 3 billion active per token. It scores 73.4% on SWE-bench Verified and 37.0 on MCPMark, which makes it one of the strongest open-weight models for agentic coding right now.&lt;/p&gt;

&lt;p&gt;This post walks through serving it locally with vLLM, calling it from Python with the OpenAI SDK, and wiring up tool calling so the model can act as a coding agent.&lt;/p&gt;

&lt;p&gt;Note: This is a personal summary based on publicly available information, not the official view of any company.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;vLLM 0.19.0 or later (required for Qwen3.6 architecture support)&lt;/li&gt;
&lt;li&gt;NVIDIA GPU (RTX 4090 24GB works for single-GPU, multi-GPU for larger context)&lt;/li&gt;
&lt;li&gt;Python 3.12&lt;/li&gt;
&lt;li&gt;The model downloads automatically from Hugging Face on first launch
## Install vLLM
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; venv qwen36-env
&lt;span class="nb"&gt;source &lt;/span&gt;qwen36-env/bin/activate
pip &lt;span class="nb"&gt;install &lt;/span&gt;vllm&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;0.19.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Older vLLM versions do not support the Qwen3.6 MoE architecture. If you hit errors about &lt;code&gt;Qwen3MoeSparseMoeBlock&lt;/code&gt;, your vLLM is too old.&lt;/p&gt;
&lt;h2&gt;
  
  
  Start the vLLM Server
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Basic (inference only)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen3.6-35B-A3B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 32768 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; qwen3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;--reasoning-parser qwen3&lt;/code&gt; flag enables thinking mode, where the model generates internal reasoning steps before its final answer. This improves accuracy on coding tasks.&lt;/p&gt;

&lt;p&gt;On a single RTX 4090, keep &lt;code&gt;--max-model-len&lt;/code&gt; at 32768 or 65536. The full 262,144 context will OOM.&lt;/p&gt;
&lt;h3&gt;
  
  
  With tool calling
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen3.6-35B-A3B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 32768 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; qwen3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; qwen3_coder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;&lt;code&gt;--tool-call-parser qwen3_coder&lt;/code&gt; is mandatory.&lt;/strong&gt; Without it, the model generates tool call JSON but vLLM will not parse it into structured &lt;code&gt;tool_calls&lt;/code&gt; objects. This is the most common setup mistake and it fails silently.&lt;/p&gt;
&lt;h3&gt;
  
  
  Multi-GPU (example: 4 GPUs)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen3.6-35B-A3B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 262144 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; qwen3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; qwen3_coder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Once running, the OpenAI-compatible API is available at &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Basic Chat From Python
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dummy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# vLLM local does not require a real key
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3.6-35B-A3B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python generator for the Fibonacci sequence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Since vLLM exposes an OpenAI-compatible endpoint, the standard OpenAI SDK works directly. Swapping from &lt;code&gt;gpt-4o&lt;/code&gt; to a local Qwen3.6 is a one-line base_url change.&lt;/p&gt;
&lt;h2&gt;
  
  
  Tool Calling (Function Calling)
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting. Qwen3.6-35B-A3B was explicitly trained on tool-use patterns, scoring 37.0 on MCPMark compared to 18.1 for Gemma 4-31B.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dummy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search for files in the project by keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_extension&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File extension filter (e.g. .py, .ts)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the contents of a file at a given path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write content to a file at a given path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content to write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3.6-35B-A3B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding agent. Search, read, and modify files to complete the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s request.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find all Python files that handle database connections and change the pool size from 5 to 20.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# recommended for thinking mode
&lt;/span&gt;    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Args: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;---&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Agent Loop (Multi-Turn Tool Calling)
&lt;/h2&gt;

&lt;p&gt;A real coding agent needs to call a tool, feed the result back to the model, then let it decide the next action. Here is a minimal loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a coding agent. Use the available tools to complete the task.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_request&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_steps&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3.6-35B-A3B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;assistant_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Done] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;assistant_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tool_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
            &lt;span class="n"&gt;tool_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Step &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reached maximum steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Replace with real file system operations.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/db/connection.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pool_size=5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/db/config.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;match&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POOL_SIZE = 5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;# Contents of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unknown tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;execute_tool&lt;/code&gt; with real file system calls and you have a working local coding agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking Mode
&lt;/h2&gt;

&lt;p&gt;Qwen3.6 has two inference modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Temperature&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Thinking (recommended)&lt;/td&gt;
&lt;td&gt;1.0&lt;/td&gt;
&lt;td&gt;Complex coding, debugging, design decisions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Non-thinking&lt;/td&gt;
&lt;td&gt;0.7&lt;/td&gt;
&lt;td&gt;Simple completions, quick answers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When &lt;code&gt;--reasoning-parser qwen3&lt;/code&gt; is set at server startup, thinking mode is on by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hardware Guide
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;max-model-len&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 (24GB) × 1&lt;/td&gt;
&lt;td&gt;32,768&lt;/td&gt;
&lt;td&gt;Handles most coding tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 × 2 (TP=2)&lt;/td&gt;
&lt;td&gt;65,536&lt;/td&gt;
&lt;td&gt;Enough for repo-wide context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A100 80GB × 1&lt;/td&gt;
&lt;td&gt;131,072&lt;/td&gt;
&lt;td&gt;Comfortable single-GPU setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;H100 × 4 (TP=4)&lt;/td&gt;
&lt;td&gt;262,144&lt;/td&gt;
&lt;td&gt;Full context, production use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The FP8 variant (&lt;code&gt;Qwen/Qwen3.6-35B-A3B-FP8&lt;/code&gt;) uses less VRAM with nearly identical performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;vLLM 0.19.0+ serves Qwen3.6-35B-A3B as an OpenAI-compatible API at &lt;code&gt;localhost:8000/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tool calling requires &lt;code&gt;--enable-auto-tool-choice --tool-call-parser qwen3_coder&lt;/code&gt; at startup — without it, tool calls silently fail&lt;/li&gt;
&lt;li&gt;The standard OpenAI Python SDK works directly, so switching from a cloud API to local inference is a one-line change&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;temperature=1.0&lt;/code&gt; with thinking mode for best coding accuracy&lt;/li&gt;
&lt;li&gt;Apache 2.0 license — free for commercial use
The model is three days old, so tool calling stability is still being validated by the community. Test on your own workloads before shipping to production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Qwen/Qwen3.6-35B-A3B - Hugging Face&lt;/li&gt;
&lt;li&gt;Qwen3.5 &amp;amp; Qwen3.6 Usage Guide - vLLM Recipes&lt;/li&gt;
&lt;li&gt;QwenLM/Qwen3.6 - GitHub&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Anthropic Releases Claude Opus 4.7: Key Changes and Migration Guide for Developers</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Fri, 17 Apr 2026 08:07:59 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/anthropic-releases-claude-opus-47-key-changes-and-migration-guide-for-developers-3an4</link>
      <guid>https://dev.to/lavellehatcherjr/anthropic-releases-claude-opus-47-key-changes-and-migration-guide-for-developers-3an4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzotccb34v1aria8b0vpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzotccb34v1aria8b0vpp.png" alt=" " width="800" height="419"&gt;&lt;/a&gt;&lt;br&gt;
Here is a developer-focused summary of what changed in Claude Opus 4.7, released on April 16, 2026.&lt;/p&gt;

&lt;p&gt;Note: This article is a personal summary based on publicly available information, not the official view of any company. This article does not constitute financial or investment advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Opus 4.7 Sits
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.7 is Anthropic's most capable generally available model. It sits below Claude Mythos Preview on benchmarks, but Mythos Preview remains restricted to a handful of platform partners through Project Glasswing and is not available for general use.&lt;/p&gt;

&lt;p&gt;Pricing is unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens. The model ID is &lt;code&gt;claude-opus-4-7&lt;/code&gt;. It is available across all Claude products, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Results
&lt;/h2&gt;

&lt;p&gt;Key numbers from the release and third-party evaluations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SWE-bench Verified: 87.6% (significant improvement over Opus 4.6)&lt;/li&gt;
&lt;li&gt;SWE-bench Pro: 64.3% (Opus 4.6: 53.4%, GPT-5.4: 57.7%)&lt;/li&gt;
&lt;li&gt;CursorBench: 70% (Opus 4.6: 58%)&lt;/li&gt;
&lt;li&gt;MCP-Atlas (multi-tool orchestration): 77.3% (best in class)&lt;/li&gt;
&lt;li&gt;CharXiv visual reasoning: 82.1% (Opus 4.6: 69.1%)&lt;/li&gt;
&lt;li&gt;XBOW visual acuity: 98.5% (Opus 4.6: 54.5%)
Rakuten reported 3x more production tasks resolved compared to Opus 4.6. CodeRabbit noted recall improved by over 10 percent, with the model being slightly faster than GPT-5.4 at xhigh effort.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  New Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Resolution Image Support
&lt;/h3&gt;

&lt;p&gt;Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution increased from 1,568 pixels on the long edge (about 1.15 megapixels) to 2,576 pixels (about 3.75 megapixels), which is roughly 3x the visual capacity of previous Claude models.&lt;/p&gt;

&lt;p&gt;For computer use workflows, pixel coordinates now map 1:1 with actual screen pixels, eliminating the scale-factor math that was previously required. Document analysis benefits from the ability to read smaller text and finer details in scanned documents, slides, and diagrams.&lt;/p&gt;

&lt;h3&gt;
  
  
  xhigh Effort Level
&lt;/h3&gt;

&lt;p&gt;The effort parameter now has five levels: low, medium, high, xhigh, and max. The new xhigh level sits between high and max, providing deeper reasoning than high without the full cost of max.&lt;/p&gt;

&lt;p&gt;Claude Code defaults to xhigh for all plans. Anthropic recommends starting with high or xhigh for coding and agentic use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Budgets (Public Beta)
&lt;/h3&gt;

&lt;p&gt;Task budgets let developers set a token allowance for an entire agentic loop rather than a single turn. The model sees a running countdown and uses it to prioritize work, skip low-value steps, and finish gracefully as the budget runs out. This is useful for preventing cost runaway in long-running agent sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code &lt;code&gt;/ultrareview&lt;/code&gt; Command
&lt;/h3&gt;

&lt;p&gt;A new dedicated code review command that performs a multi-pass review looking for bugs, edge cases, security issues, and logic errors with more depth than a standard review pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Breaking API Changes
&lt;/h2&gt;

&lt;p&gt;Three changes that will cause errors if not addressed:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Extended Thinking Budgets Removed
&lt;/h3&gt;

&lt;p&gt;Setting &lt;code&gt;thinking: {"type": "enabled", "budget_tokens": N}&lt;/code&gt; now returns a 400 error. The only supported thinking mode on Opus 4.7 is &lt;code&gt;thinking: {"type": "adaptive"}&lt;/code&gt;. Note that adaptive thinking is off by default; requests with no &lt;code&gt;thinking&lt;/code&gt; field run without thinking. You must set it explicitly to enable it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Sampling Parameters Removed
&lt;/h3&gt;

&lt;p&gt;Setting &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, or &lt;code&gt;top_k&lt;/code&gt; to any non-default value returns a 400 error. Use prompting to guide output behavior instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Thinking Content Hidden by Default
&lt;/h3&gt;

&lt;p&gt;Thinking blocks still appear in the response stream, but their content is empty unless you opt in with &lt;code&gt;"display": "summarized"&lt;/code&gt;. If your product streams reasoning to users, the new default will appear as a long pause before output begins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration Code Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before (Opus 4.6)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;thinking&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;

&lt;span class="c1"&gt;# After (Opus 4.7)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;thinking&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;adaptive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;# Remove temperature entirely — use prompting instead
# Increase max_tokens for headroom (new tokenizer uses more tokens)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Behavior Changes
&lt;/h2&gt;

&lt;p&gt;These are not API breaking changes but may require prompt adjustments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More literal instruction following, particularly at lower effort levels. The model will not silently generalize an instruction from one item to another&lt;/li&gt;
&lt;li&gt;Response length calibrates to perceived task complexity rather than defaulting to a fixed verbosity&lt;/li&gt;
&lt;li&gt;Fewer tool calls by default. Raise effort to increase tool usage&lt;/li&gt;
&lt;li&gt;More direct, opinionated tone with less validation-forward phrasing than Opus 4.6&lt;/li&gt;
&lt;li&gt;More regular progress updates during long agentic traces. If you added scaffolding to force interim status messages, try removing it&lt;/li&gt;
&lt;li&gt;Fewer subagents spawned by default. Steerable through prompting
## Tokenizer Change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Opus 4.7 uses a new tokenizer that may produce roughly 1.0 to 1.35x as many tokens for the same input, depending on content type. Per-token prices are unchanged, but the same prompt may cost more in practice. Test your workloads before switching production traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cybersecurity Safeguards
&lt;/h2&gt;

&lt;p&gt;Opus 4.7 includes automated safeguards that detect and block requests involving prohibited or high-risk cybersecurity uses. Cyber capabilities were deliberately reduced compared to Mythos Preview. Security professionals who want to use the model for legitimate purposes such as vulnerability research and penetration testing can apply through the Cyber Verification Program.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Migrate and When
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Teams running production coding agents&lt;/strong&gt;: The SWE-bench gains are large enough that the upgrade likely pays for itself in reduced human review cycles. Pair with task budgets to control costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teams using computer use or image-heavy workflows&lt;/strong&gt;: The 3.75 megapixel vision support alone justifies the switch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple Q&amp;amp;A or FAQ bots&lt;/strong&gt;: Haiku 4.5 or Sonnet 4.6 are more cost-effective. No need to move to Opus for these workloads
The safe migration approach is to keep Opus 4.6 as a fallback for one to two weeks while validating Opus 4.7 on your production workloads in parallel.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/claude-opus-4-7" rel="noopener noreferrer"&gt;Introducing Claude Opus 4.7 (Anthropic)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7" rel="noopener noreferrer"&gt;What's new in Claude Opus 4.7 (Anthropic API Docs)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cnbc.com/2026/04/16/anthropic-claude-opus-4-7-model-mythos.html" rel="noopener noreferrer"&gt;Anthropic rolls out Claude Opus 4.7 (CNBC)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/aws/introducing-anthropics-claude-opus-4-7-model-in-amazon-bedrock/" rel="noopener noreferrer"&gt;Introducing Anthropic's Claude Opus 4.7 model in Amazon Bedrock (AWS)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>news</category>
    </item>
    <item>
      <title>Calling Anthropic's Advisor Tool in 50 Lines of Python</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Mon, 13 Apr 2026 11:39:09 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/calling-anthropics-advisor-tool-in-50-lines-of-python-5fk0</link>
      <guid>https://dev.to/lavellehatcherjr/calling-anthropics-advisor-tool-in-50-lines-of-python-5fk0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8374szbwvkmc97frvft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8374szbwvkmc97frvft.png" alt=" " width="800" height="336"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;This article reflects my own experience and research. It is not the official view of any company mentioned.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When I first read Anthropic's Advisor Strategy post earlier this week, my first thought was: can a single &lt;code&gt;/v1/messages&lt;/code&gt; call really let one Claude model consult another one mid-generation? I wanted to see the actual wire format and the token accounting before I trusted it in production, so I sat down and wrote the smallest working example I could. That is what this article is.&lt;/p&gt;
&lt;h2&gt;
  
  
  Versions used
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.11&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;anthropic&lt;/code&gt; Python SDK 0.94.0 (released 2026-04-10)&lt;/li&gt;
&lt;li&gt;Claude API, advisor tool in public beta since 2026-04-09&lt;/li&gt;
&lt;li&gt;Beta header: &lt;code&gt;advisor-tool-2026-03-01&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Tool type: &lt;code&gt;advisor_20260301&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are reading this later, double check the beta header and tool type against the official docs. Beta names change when features move toward GA.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the advisor tool actually does
&lt;/h2&gt;

&lt;p&gt;Most server side tools the executor can call (web search, code execution) perform an action and return data. The advisor tool is different. When the executor invokes it, the server runs a separate sub-inference on a stronger model using the entire transcript so far, then injects the advice back into the executor's stream. No extra round trip on your side.&lt;/p&gt;

&lt;p&gt;The mechanics are slightly unusual. The executor emits a &lt;code&gt;server_tool_use&lt;/code&gt; block with &lt;code&gt;name: "advisor"&lt;/code&gt; and, unusually, an empty &lt;code&gt;input&lt;/code&gt;. The executor only decides the timing. The server constructs the advisor's view automatically from the full transcript (system prompt, tool definitions, prior turns, prior tool results). Then the advisor runs without tools and without its own context management, its thinking blocks are stripped, and only the advice text lands back in the executor's prompt as an &lt;code&gt;advisor_tool_result&lt;/code&gt; block. The executor resumes generating.&lt;/p&gt;

&lt;p&gt;The pairing Anthropic recommends is Sonnet 4.6 (executor) plus Opus 4.6 (advisor). Haiku 4.5 also works as an executor. The only advisor model available today is &lt;code&gt;claude-opus-4-6&lt;/code&gt;, and the advisor must be at least as capable as the executor.&lt;/p&gt;
&lt;h2&gt;
  
  
  The minimal call
&lt;/h2&gt;

&lt;p&gt;Here is the smallest viable request, using &lt;code&gt;client.beta.messages.create&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;betas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor-tool-2026-03-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_20260301&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Build a concurrent worker pool in Go with graceful shutdown.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four things worth pointing at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;betas=["advisor-tool-2026-03-01"]&lt;/code&gt; turns the feature on. This is the SDK shortcut for the &lt;code&gt;anthropic-beta&lt;/code&gt; header.&lt;/li&gt;
&lt;li&gt;The tool type is &lt;code&gt;advisor_20260301&lt;/code&gt;, and &lt;code&gt;name&lt;/code&gt; must literally be the string &lt;code&gt;advisor&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;model&lt;/code&gt; inside the tool definition is the advisor model. The top level &lt;code&gt;model&lt;/code&gt; is the executor.&lt;/li&gt;
&lt;li&gt;You call &lt;code&gt;client.beta.messages.create&lt;/code&gt;, not &lt;code&gt;client.messages.create&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reading what came back
&lt;/h2&gt;

&lt;p&gt;When the executor decides to consult the advisor, two new content blocks appear in the response: a &lt;code&gt;server_tool_use&lt;/code&gt; block with an empty &lt;code&gt;input&lt;/code&gt;, followed by an &lt;code&gt;advisor_tool_result&lt;/code&gt; block carrying the advice. This loop walks the content array and pulls each piece out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EXECUTOR:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;server_tool_use&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ADVISOR CALL:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ADVISOR SAID:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_tool_result_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ADVISOR FAILED:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the two success variants. &lt;code&gt;advisor_result&lt;/code&gt; carries human readable &lt;code&gt;text&lt;/code&gt;. &lt;code&gt;advisor_redacted_result&lt;/code&gt; carries &lt;code&gt;encrypted_content&lt;/code&gt; that you round trip verbatim on the next turn. Opus 4.6 returns plaintext today, but other advisor models may not. If the sub-inference fails, you get &lt;code&gt;advisor_tool_result_error&lt;/code&gt; with an &lt;code&gt;error_code&lt;/code&gt; such as &lt;code&gt;overloaded&lt;/code&gt;, &lt;code&gt;too_many_requests&lt;/code&gt;, &lt;code&gt;max_uses_exceeded&lt;/code&gt;, &lt;code&gt;prompt_too_long&lt;/code&gt;, or &lt;code&gt;execution_time_exceeded&lt;/code&gt;. The whole request does not fail in that case. The executor keeps going without further advice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Counting the tokens properly
&lt;/h2&gt;

&lt;p&gt;This is the part I wanted to see with my own eyes. Usage is split between executor and advisor, and the top level &lt;code&gt;usage.input_tokens&lt;/code&gt; does not include the advisor's tokens at all. Everything lives in &lt;code&gt;usage.iterations[]&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Executor output tokens (top level): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] advisor (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;): &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; out=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] executor: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; out=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Advisor tokens bill at the advisor model's rate, so rolling them into the executor numbers would give you the wrong cost. The docs spell out the aggregation rules: top level &lt;code&gt;output_tokens&lt;/code&gt; is the sum across executor iterations, and top level &lt;code&gt;input_tokens&lt;/code&gt; reflects the first executor iteration only. For anything resembling billing, loop over &lt;code&gt;iterations&lt;/code&gt; and group by &lt;code&gt;type&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Capping cost with max_uses
&lt;/h2&gt;

&lt;p&gt;The advisor tool ships without a conversation level cap, but it does support a per request &lt;code&gt;max_uses&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_20260301&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_uses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the executor hits that cap, additional advisor calls return an &lt;code&gt;advisor_tool_result_error&lt;/code&gt; with &lt;code&gt;error_code: "max_uses_exceeded"&lt;/code&gt;. This is per request, so on a multi turn conversation you still need a client side counter if you want a total ceiling. When you decide to stop offering the advisor, the docs are explicit: remove it from &lt;code&gt;tools&lt;/code&gt; AND strip every &lt;code&gt;advisor_tool_result&lt;/code&gt; block from your message history before the next request. Leaving the blocks behind without the tool returns a &lt;code&gt;400 invalid_request_error&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advisor side caching
&lt;/h2&gt;

&lt;p&gt;For long agent loops where the advisor fires three or more times, you can enable caching on the advisor's own transcript:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor_20260301&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;advisor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;caching&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ephemeral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ttl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The shape is fixed: &lt;code&gt;type&lt;/code&gt; must be &lt;code&gt;ephemeral&lt;/code&gt; and &lt;code&gt;ttl&lt;/code&gt; is &lt;code&gt;5m&lt;/code&gt; or &lt;code&gt;1h&lt;/code&gt;. Unlike &lt;code&gt;cache_control&lt;/code&gt; on normal content blocks, this is just an on or off switch. The server decides where the cache boundaries go. The documented break even point is about three advisor calls per conversation. Below that, the write cost exceeds the read savings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to watch out for
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streaming pauses.&lt;/strong&gt; The advisor sub-inference does not stream. While it runs, your executor stream sits idle except for standard SSE &lt;code&gt;ping&lt;/code&gt; keepalives roughly every 30 seconds. Short advisor calls may show no pings at all. Your UI needs to handle that silence without timing out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;max_tokens&lt;/code&gt; bounds the executor only.&lt;/strong&gt; It does not cap advisor output. Budget for an extra 1,400 to 1,800 tokens per advisor call (400 to 700 text plus thinking).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits draw from two buckets.&lt;/strong&gt; Executor rate limits fail the whole request with HTTP 429. Advisor rate limits come back as &lt;code&gt;too_many_requests&lt;/code&gt; inside the &lt;code&gt;advisor_tool_result&lt;/code&gt; block, and the request continues.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invalid pairings return 400.&lt;/strong&gt; The advisor must be at least as capable as the executor. Today that means Opus as advisor for any executor. Haiku as advisor is not supported.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not rewrite redacted results.&lt;/strong&gt; If the advisor returns &lt;code&gt;advisor_redacted_result&lt;/code&gt;, pass the opaque &lt;code&gt;encrypted_content&lt;/code&gt; back on the next turn verbatim. The server decrypts it server side. Reading it or substituting text will break the conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context editing has sharp edges.&lt;/strong&gt; &lt;code&gt;clear_thinking&lt;/code&gt; with any &lt;code&gt;keep&lt;/code&gt; value other than &lt;code&gt;"all"&lt;/code&gt; shifts the advisor's quoted transcript each turn and kills advisor side caching. If you use extended thinking alongside the advisor, set &lt;code&gt;keep: "all"&lt;/code&gt; explicitly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Is it worth wiring in?
&lt;/h2&gt;

&lt;p&gt;From a single request cost angle, the advisor is cheaper than Opus solo whenever your task is mostly mechanical output with a few key decisions. It is more expensive than Sonnet solo whenever those decisions are unnecessary. That tradeoff lives in your prompt and your workload, not in the API. I would not blindly turn it on for chat, but for agent loops with dozens of turns it is the right knob to have.&lt;/p&gt;

&lt;p&gt;Anthropic's own guidance in the docs is specific about timing: call the advisor early, after a few exploratory reads are in the transcript but before substantive work begins, and call it again near the end after file writes and test outputs are available. That matches what I see in practice. The advisor adds almost all of its value in the first call, before your approach crystallizes. If you wait until the executor is three quarters of the way through a wrong solution, the advisor will politely tell you so and you will still have to redo the work.&lt;/p&gt;

&lt;p&gt;The part I underestimated before writing this example was the usage accounting. If you have a billing pipeline that reads &lt;code&gt;usage.input_tokens&lt;/code&gt; and &lt;code&gt;usage.output_tokens&lt;/code&gt; directly, it will silently undercount advisor time. Migrate to &lt;code&gt;iterations&lt;/code&gt; before you flip this on in production.&lt;/p&gt;

&lt;p&gt;What would you use a second opinion model for in your own agent loops? I am curious whether people are reaching for this more on planning or on verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool" rel="noopener noreferrer"&gt;Advisor tool - Claude API docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://claude.com/blog/the-advisor-strategy" rel="noopener noreferrer"&gt;The advisor strategy: Give Sonnet an intelligence boost with Opus - Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/anthropic/" rel="noopener noreferrer"&gt;anthropic PyPI package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/anthropic-sdk-python/releases" rel="noopener noreferrer"&gt;anthropic-sdk-python GitHub releases&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What I Learned Calling 4 Different LLM APIs From the Same Codebase</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:03:06 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/what-i-learned-calling-4-different-llm-apis-from-the-same-codebase-42jb</link>
      <guid>https://dev.to/lavellehatcherjr/what-i-learned-calling-4-different-llm-apis-from-the-same-codebase-42jb</guid>
      <description>&lt;p&gt;Most comparison articles give you benchmark scores. This one gives you the practical details benchmarks don't cover: response format differences, streaming implementations, and cost considerations I encountered while building a tool that lets users pick their own LLM provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why one codebase, four APIs?
&lt;/h2&gt;

&lt;p&gt;I build browser-based dev tools. One of my projects needed to support multiple LLM providers so users could choose whichever API they prefer. The user picks their provider, enters their own API key, and the tool handles the rest.&lt;/p&gt;

&lt;p&gt;Sounds simple. It turned out to be more nuanced than expected.&lt;/p&gt;

&lt;p&gt;Supporting OpenAI, Google Gemini, Anthropic Claude, and any OpenAI-compatible endpoint (like local Ollama or LM Studio) from a single codebase taught me a lot about the practical differences between these APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Response format differences
&lt;/h2&gt;

&lt;p&gt;Every provider returns responses in a slightly different structure.&lt;/p&gt;

&lt;p&gt;OpenAI (Chat Completions API):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OpenAI also offers a newer Responses API (&lt;code&gt;/v1/responses&lt;/code&gt;) which returns a different format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"output_text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}]}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidates"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"parts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}]}}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This seems trivial until you realize your entire downstream pipeline depends on extracting that text reliably. I ended up writing a normalizer function early on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;// Chat Completions API format&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai-responses&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;// Responses API format&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;
        &lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;message&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output_text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;
        &lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;
        &lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;// OpenAI-compatible fallback&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write this normalizer first. Before you build anything else. Trust me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming implementations
&lt;/h2&gt;

&lt;p&gt;All four providers support streaming, but each has its own implementation worth understanding.&lt;/p&gt;

&lt;p&gt;OpenAI and compatible endpoints use &lt;code&gt;data: [DONE]&lt;/code&gt; to signal the end of a stream. Claude uses &lt;code&gt;event: message_stop&lt;/code&gt;. Gemini has its own SSE format.&lt;/p&gt;

&lt;p&gt;The chunk structure is different too. OpenAI sends &lt;code&gt;delta.content&lt;/code&gt;. Claude sends &lt;code&gt;delta.text&lt;/code&gt; inside a &lt;code&gt;content_block_delta&lt;/code&gt; event. Gemini sends partial &lt;code&gt;text&lt;/code&gt; inside &lt;code&gt;candidates[0].content.parts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you're building a UI that shows streaming text, you'll need a parser for each provider's format.&lt;/p&gt;

&lt;h2&gt;
  
  
  System prompt handling
&lt;/h2&gt;

&lt;p&gt;OpenAI Chat Completions API accepts a &lt;code&gt;system&lt;/code&gt; role in the messages array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful assistant."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;OpenAI's newer Responses API uses a top-level &lt;code&gt;instructions&lt;/code&gt; parameter instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"instructions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful assistant."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude takes the system prompt as a separate top-level parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful assistant."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini uses &lt;code&gt;system_instruction&lt;/code&gt; as a separate field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system_instruction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"parts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful assistant."&lt;/span&gt;&lt;span class="p"&gt;}]},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're abstracting this behind a single interface, you need to intercept the system message and route it to the correct location before sending the request. This ensures the model properly receives your system prompt regardless of provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token counting and cost
&lt;/h2&gt;

&lt;p&gt;Each provider has its own pricing structure, so it's worth understanding the differences.&lt;/p&gt;

&lt;p&gt;OpenAI charges separately for input and output tokens. Claude does the same but with different pricing tiers per model. Gemini has a free tier with rate limits, and a paid tier.&lt;/p&gt;

&lt;p&gt;An interesting observation: the same prompt can produce different output lengths depending on the provider. Each model has its own default verbosity level. This means your cost per request varies even when the input is identical.&lt;/p&gt;

&lt;p&gt;If you're letting users bring their own API key, make this transparent. Show estimated token counts before sending the request if possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error handling differences
&lt;/h2&gt;

&lt;p&gt;Each API returns errors differently.&lt;/p&gt;

&lt;p&gt;OpenAI returns &lt;code&gt;error.message&lt;/code&gt; with HTTP status codes you'd expect (429 for rate limit, 401 for bad key).&lt;/p&gt;

&lt;p&gt;Claude returns errors in a &lt;code&gt;error.type&lt;/code&gt; and &lt;code&gt;error.message&lt;/code&gt; structure. Rate limits come back as &lt;code&gt;rate_limit_error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Gemini sometimes returns 200 OK with an error inside the response body, so it's important to check the response content as well as the HTTP status code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Check both status codes and response body&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// Gemini can return 200 with an error&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;If I started over today, here's what I'd do from day one:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Normalize everything immediately.&lt;/strong&gt; Keep provider-specific response formats contained in an adapter layer so your application logic stays clean.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test with the cheapest model from each provider.&lt;/strong&gt; Save your token budget by using GPT-4o-mini (or the newer GPT-5 series mini models), Claude Haiku, and Gemini Flash during development.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;OpenAI-compatible is your best friend.&lt;/strong&gt; If a provider supports the OpenAI format (and many do, including local tools like Ollama and LM Studio), treat them all as one integration. That covers 80% of providers with one code path.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stream from the start.&lt;/strong&gt; Adding streaming to a synchronous architecture later requires significant refactoring. Build for streaming on day one even if you don't need it yet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Log raw responses during development.&lt;/strong&gt; Having the raw API response saved makes debugging much faster when investigating unexpected behavior.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;The LLM API landscape in 2026 has evolved significantly since 2024.&lt;/p&gt;

&lt;p&gt;Most providers now support function calling, structured outputs, and vision. The baseline quality is high enough that the "best" model depends more on your specific use case than on benchmark rankings. OpenAI now offers both the Chat Completions API and the newer Responses API, adding another dimension to consider when integrating.&lt;/p&gt;

&lt;p&gt;At the same time, each provider continues to add features with their own implementations. MCP, tool use, multimodal inputs, and structured outputs all have differences across providers, which makes a good abstraction layer increasingly valuable.&lt;/p&gt;

&lt;p&gt;The key advantage in this environment isn't picking the "best" LLM. It's building clean abstractions that let you switch between providers without rewriting your application.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/api-reference" rel="noopener noreferrer"&gt;OpenAI API Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/guides/migrate-to-responses" rel="noopener noreferrer"&gt;OpenAI Responses API Migration Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/api" rel="noopener noreferrer"&gt;Anthropic API Reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/docs" rel="noopener noreferrer"&gt;Google Gemini API Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I built an AI browser extension and website builder here is what went into each decision</title>
      <dc:creator>Lavelle Hatcher Jr</dc:creator>
      <pubDate>Fri, 03 Apr 2026 23:07:28 +0000</pubDate>
      <link>https://dev.to/lavellehatcherjr/i-built-an-ai-browser-extension-and-website-builder-here-is-what-went-into-each-decision-3ao9</link>
      <guid>https://dev.to/lavellehatcherjr/i-built-an-ai-browser-extension-and-website-builder-here-is-what-went-into-each-decision-3ao9</guid>
      <description>&lt;h3&gt;
  
  
  The problem I kept running into
&lt;/h3&gt;

&lt;p&gt;Every browser extension project starts the same way. You need a manifest, a background script, popup HTML, icons. Before you have written a single line of real logic you have already written a hundred lines of setup.&lt;/p&gt;

&lt;p&gt;AI can handle that. But the harder problem is what comes after. You get a working extension, you want to change one file, and either you regenerate everything or you edit manually with no context. Neither is good.&lt;/p&gt;

&lt;p&gt;I built NuModeX Ext Maker to solve both.&lt;/p&gt;




&lt;h3&gt;
  
  
  What it does
&lt;/h3&gt;

&lt;p&gt;NuModeX Ext Maker generates Manifest V3 browser extensions and static websites from text prompts. Output is code structured for Chrome, Edge, Firefox, Whale, Opera, or Safari.&lt;/p&gt;

&lt;p&gt;Available now on the Chrome Web Store, Firefox Add-ons, Edge Add-ons, and the Whale Store. The interface is available in English, Japanese, Spanish, French, Korean, Chinese, German, Portuguese, and Italian.&lt;/p&gt;




&lt;h3&gt;
  
  
  Import from ZIP
&lt;/h3&gt;

&lt;p&gt;The most requested feature before launch. Load an existing browser extension or website from a ZIP file directly into the tool and edit it with AI. No manual file-by-file importing, no rebuilding from scratch.&lt;/p&gt;

&lt;p&gt;This matters because a lot of people have extensions they already built by hand, with other tools, years ago and want to maintain or extend them without starting over.&lt;/p&gt;




&lt;h3&gt;
  
  
  The post-build editing toolkit
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Edit File&lt;/strong&gt; - select a file from the tree, describe the change, AI rewrites only that file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Improve Extension&lt;/strong&gt; - passes the entire project to the AI. Describe a multi-file change in one prompt and it figures out what to update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add File&lt;/strong&gt; - create new files in an existing project via plain language.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;View Changes&lt;/strong&gt; - diff viewer showing every file the AI modified, line-by-line or side-by-side, before you accept anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Undo&lt;/strong&gt; - reverse the last AI edit in one click.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Import Files&lt;/strong&gt; - bring individual existing files into the project for AI editing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual inline editor&lt;/strong&gt; - direct code editing when you want full control.&lt;/p&gt;




&lt;h3&gt;
  
  
  Live preview
&lt;/h3&gt;

&lt;p&gt;A sandboxed iframe preview renders the output before you download. Multi-project support with auto-naming from the manifest. Projects auto-save and restore on reopen.&lt;/p&gt;




&lt;h3&gt;
  
  
  The AI setup
&lt;/h3&gt;

&lt;p&gt;Not tied to any specific provider. Cloud AI models via your own API key, on-device AI models where your browser supports them with no API key required, or a custom model on a local or remote server running the /v1/chat/completions API.&lt;/p&gt;

&lt;p&gt;One honest note on on-device models: they handle chat and editing well but cannot build full extensions from scratch. Use a cloud or custom model for initial generation.&lt;/p&gt;




&lt;h3&gt;
  
  
  No accounts, no connections, no setup maze
&lt;/h3&gt;

&lt;p&gt;A lot of AI tools require you to create an account, connect a workspace, authorize third-party integrations, and navigate a settings flow before you can do anything. NuModeX Ext Maker has none of that. Install the extension, paste in your API key from your chosen cloud AI provider, and you are building. Everything runs in your browser. There is no dashboard to log into, no external service to connect, and no platform sitting between you and your AI provider.&lt;/p&gt;




&lt;h3&gt;
  
  
  The licensing decision
&lt;/h3&gt;

&lt;p&gt;NuModeX Ext Maker is licensed under &lt;strong&gt;BSL 1.1&lt;/strong&gt; (Business Source License 1.1). Source is publicly available on GitHub.&lt;/p&gt;

&lt;p&gt;What this means in practice: free for personal use and internal business use. You can copy it, modify it, study it, run a modified version internally. The one restriction is redistribution of NuModeX Ext Maker itself to browser extension marketplaces - that requires written permission from SoraVantia GK.&lt;/p&gt;

&lt;p&gt;Extensions you generate with the tool are entirely yours. No restrictions on what you build or publish.&lt;/p&gt;

&lt;p&gt;I chose BSL 1.1 because the marketplace restriction is the one thing that actually matters for protecting the product. It does not affect the vast majority of use cases.&lt;/p&gt;




&lt;h3&gt;
  
  
  Try it
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Website: &lt;a href="https://numodex.com/numodexextmaker" rel="noopener noreferrer"&gt;https://numodex.com/numodexextmaker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Chrome Web Store: &lt;a href="https://chromewebstore.google.com/detail/numodex-ext-maker/amkcpiiepjfmcichnkniiabhdcieidpf" rel="noopener noreferrer"&gt;https://chromewebstore.google.com/detail/numodex-ext-maker/amkcpiiepjfmcichnkniiabhdcieidpf&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Firefox Add-ons: &lt;a href="https://addons.mozilla.org/firefox/addon/numodex-ext-maker/" rel="noopener noreferrer"&gt;https://addons.mozilla.org/firefox/addon/numodex-ext-maker/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Edge Add-ons: &lt;a href="https://microsoftedge.microsoft.com/addons/detail/jkdimfdgngcachpaggijnegdmmokdhkc" rel="noopener noreferrer"&gt;https://microsoftedge.microsoft.com/addons/detail/jkdimfdgngcachpaggijnegdmmokdhkc&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Whale Store: &lt;a href="https://store.whale.naver.com/detail/jmbkjagjlhbnagganjfjeknboiagnnmk" rel="noopener noreferrer"&gt;https://store.whale.naver.com/detail/jmbkjagjlhbnagganjfjeknboiagnnmk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/SoraVantia/NuModeX-Ext-Maker" rel="noopener noreferrer"&gt;https://github.com/SoraVantia/NuModeX-Ext-Maker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Built by SoraVantia GK, Japan.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What part of the browser extension workflow causes you the most friction? Curious about what to prioritize next.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>nocode</category>
      <category>extensions</category>
      <category>ai</category>
      <category>buildinpublic</category>
    </item>
  </channel>
</rss>
