<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CodeKing</title>
    <description>The latest articles on DEV Community by CodeKing (@codekingai).</description>
    <link>https://dev.to/codekingai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843914%2Fedc4fbb1-edd3-4c7d-9c94-e2b13dbc1af0.jpg</url>
      <title>DEV Community: CodeKing</title>
      <link>https://dev.to/codekingai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codekingai"/>
    <language>en</language>
    <item>
      <title>"My AI Assistant Could Code, But It Couldn't Operate My Desktop"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Tue, 26 May 2026 09:51:16 +0000</pubDate>
      <link>https://dev.to/codekingai/my-ai-assistant-could-code-but-it-couldnt-operate-my-desktop-4d97</link>
      <guid>https://dev.to/codekingai/my-ai-assistant-could-code-but-it-couldnt-operate-my-desktop-4d97</guid>
      <description>&lt;p&gt;My assistant could already read files, run shell commands, and delegate coding work to Claude Code or Codex.&lt;/p&gt;

&lt;p&gt;But the moment a workflow hit a real desktop app, the illusion broke.&lt;/p&gt;

&lt;p&gt;A browser needed a click. A page needed a scroll. A field needed real text input. A task could finish the hard part and still get stuck on the last two seconds of UI.&lt;/p&gt;

&lt;p&gt;That felt like a fake kind of automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem wasn't coding
&lt;/h2&gt;

&lt;p&gt;The hard part here wasn't generating code. It was crossing the gap between "I know what should happen next" and "I can actually operate the window in front of me."&lt;/p&gt;

&lt;p&gt;In practice, that gap showed up in small but annoying ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a browser tab needed &lt;code&gt;Ctrl+L&lt;/code&gt; and a URL paste&lt;/li&gt;
&lt;li&gt;a page exposed no reliable accessibility selector, so a screenshot was needed first&lt;/li&gt;
&lt;li&gt;a long form needed scrolling inside the right pane, not the whole desktop&lt;/li&gt;
&lt;li&gt;a final publish step still depended on one visible button&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the assistant didn't need another coding loop. It needed a safe desktop-control layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local control loop I added
&lt;/h2&gt;

&lt;p&gt;I added a small set of desktop tools around a companion agent running on the same machine.&lt;/p&gt;

&lt;p&gt;The assistant can now do things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;list windows&lt;/li&gt;
&lt;li&gt;focus a specific app&lt;/li&gt;
&lt;li&gt;find accessible controls when UI Automation is available&lt;/li&gt;
&lt;li&gt;set input values directly&lt;/li&gt;
&lt;li&gt;send hotkeys like &lt;code&gt;Ctrl+L&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;capture screenshots before pixel-based actions&lt;/li&gt;
&lt;li&gt;click, move, and scroll with explicit coordinates only after visual confirmation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key constraint is simple: &lt;strong&gt;observe first, then act&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If selectors are available, use them. If they are not, capture the window, inspect what is actually visible, and only then click. That rule matters more than any single tool because it keeps desktop automation from turning into random coordinate guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in the workflow
&lt;/h2&gt;

&lt;p&gt;Before this, the assistant could help me prepare a task but not finish anything that crossed into a real app.&lt;/p&gt;

&lt;p&gt;Now the same local loop can cover more of the actual workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;inspect window → focus app → locate control or capture screenshot → act → verify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds small, but it changes what "assistant" means in practice.&lt;/p&gt;

&lt;p&gt;It is no longer limited to code and terminal state. It can handle the messy last mile where real work often stalls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I kept it local
&lt;/h2&gt;

&lt;p&gt;I did not want this running through a hosted browser service or a remote desktop relay.&lt;/p&gt;

&lt;p&gt;Desktop control touches exactly the kind of things that should stay on the machine that owns them: open apps, visible windows, clipboard state, local sessions, and personal accounts.&lt;/p&gt;

&lt;p&gt;Keeping it local also makes the loop faster. The assistant can inspect, act, and verify against the current desktop state without shipping screenshots or UI events to another service first.&lt;/p&gt;

&lt;p&gt;That local-first constraint fits the rest of CliGate anyway. The gateway, the assistant, the runtimes, and now the desktop-control layer all live on the same box.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;The interesting lesson was that "assistant capability" is not just about better reasoning or better code generation.&lt;/p&gt;

&lt;p&gt;A lot of workflows fail because the assistant cannot cross boundaries between tools.&lt;/p&gt;

&lt;p&gt;Terminal-only automation is useful. But if the real workflow ends in a browser, settings window, login dialog, or web app form, then desktop control becomes part of the product surface whether you planned for it or not.&lt;/p&gt;

&lt;p&gt;So this update was less about making the assistant smarter and more about making it less incomplete.&lt;/p&gt;

&lt;p&gt;If you're building local AI tooling, where does your automation still stop — at the terminal, at the API, or at the desktop?&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;https://github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"My AI Assistant Could Code, But It Couldn't Operate My Desktop"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Tue, 26 May 2026 09:10:44 +0000</pubDate>
      <link>https://dev.to/codekingai/my-ai-assistant-could-code-but-it-couldnt-operate-my-desktop-1log</link>
      <guid>https://dev.to/codekingai/my-ai-assistant-could-code-but-it-couldnt-operate-my-desktop-1log</guid>
      <description>&lt;p&gt;Most AI coding agents are good until the task leaves the terminal.&lt;/p&gt;

&lt;p&gt;They can edit files. They can run tests. They can explain a diff. Then the work hits a desktop app, an OAuth approval screen, a native settings window, or a web UI that was not designed for API access. Suddenly the agent is not stuck on intelligence. It is stuck on reach.&lt;/p&gt;

&lt;p&gt;That was the gap I kept running into while building my local AI setup. I had Claude Code, Codex CLI, Gemini CLI, local models, provider keys, and account pools. The missing piece was not another model.&lt;/p&gt;

&lt;p&gt;It was an operator.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Was The Boundary
&lt;/h2&gt;

&lt;p&gt;My old workflow had two separate worlds.&lt;/p&gt;

&lt;p&gt;In one world, coding agents lived inside terminals and repos. They could reason about code, run commands, and keep a session alive.&lt;/p&gt;

&lt;p&gt;In the other world, real work still happened through desktop apps, dashboards, browser windows, chat clients, and provider consoles. A human could jump between those worlds without thinking. An agent could not.&lt;/p&gt;

&lt;p&gt;That made the assistant feel smaller than it should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It could fix a bug, but not always finish the setup.&lt;/li&gt;
&lt;li&gt;It could tell me where to click, but not click safely.&lt;/li&gt;
&lt;li&gt;It could generate a workflow, but not reliably drive the app that owned the workflow.&lt;/li&gt;
&lt;li&gt;It could reuse project knowledge, but only if I remembered to paste it in.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I changed how I think about CliGate.&lt;/p&gt;

&lt;p&gt;CliGate is no longer just a local API gateway for AI tools. It is becoming a local control plane for agent work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CliGate Does Now
&lt;/h2&gt;

&lt;p&gt;CliGate still starts as one localhost service for AI coding tools.&lt;/p&gt;

&lt;p&gt;You can point Claude Code, Codex CLI, Gemini CLI, and OpenClaw at the same local server, then manage provider keys, account pools, routing, usage, logs, and local runtimes from one dashboard.&lt;/p&gt;

&lt;p&gt;But the newer assistant layer sits above that.&lt;/p&gt;

&lt;p&gt;It has two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct runtime: keep talking to the current Codex or Claude Code session.&lt;/li&gt;
&lt;li&gt;Assistant collaboration: ask CliGate Assistant to inspect state, choose a runtime, continue a task, handle a blocked run, or summarize what happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That split matters. I do not want every normal message to be intercepted by a clever supervisor. Sometimes I just want to continue the current runtime session. Other times I want an assistant that can see the bigger picture.&lt;/p&gt;

&lt;p&gt;The assistant is not trying to replace Codex or Claude Code. It coordinates them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Skills Made It Less Generic
&lt;/h2&gt;

&lt;p&gt;The second piece is skills.&lt;/p&gt;

&lt;p&gt;A skill is a local package of instructions, scripts, templates, and references. The assistant does not need every detail in context all the time. It can see a short description first, then read the full &lt;code&gt;SKILL.md&lt;/code&gt; only when the task matches.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skills/
  devto-publisher/
    SKILL.md
    publish.js
    templates/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That turns the assistant from "a general chat box with tools" into something closer to a teammate with reusable procedures.&lt;/p&gt;

&lt;p&gt;One skill can know how to publish a Dev.to article. Another can know how to build a spreadsheet. Another can know the conventions of a local repo. The key is that these are local, inspectable, and executable through the same permission system as the rest of the agent.&lt;/p&gt;

&lt;p&gt;It is not magic. It is just a better way to keep operational knowledge out of one giant prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Desktop Part Is The Big Unlock
&lt;/h2&gt;

&lt;p&gt;The part I am most excited about is desktop control.&lt;/p&gt;

&lt;p&gt;The first naive version of desktop automation is usually visual: take a screenshot, ask the model where to click, move the mouse, repeat. That works for demos, but it is fragile. Small buttons, focus changes, DPI scaling, popups, and animations can break it.&lt;/p&gt;

&lt;p&gt;CliGate's desktop agent takes a different default path on Windows: UI Automation first, screenshots second.&lt;/p&gt;

&lt;p&gt;Instead of guessing pixels, the assistant can ask the operating system for the UI tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;list windows -&amp;gt; focus app -&amp;gt; find input -&amp;gt; set value -&amp;gt; send Enter -&amp;gt; read text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means it can find a textbox by control type, set its value through the accessibility API, invoke a button, read visible text, and only fall back to screenshots when the app does not expose useful accessibility metadata.&lt;/p&gt;

&lt;p&gt;This is the bridge I wanted: a coding assistant that can work in repos, but also operate the desktop applications that surround the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Is Going
&lt;/h2&gt;

&lt;p&gt;The current shape is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CliGate routes AI coding tools through one local server.&lt;/li&gt;
&lt;li&gt;Runtime sessions keep Codex and Claude Code work alive.&lt;/li&gt;
&lt;li&gt;The assistant watches, coordinates, and summarizes.&lt;/li&gt;
&lt;li&gt;Skills give it reusable procedures.&lt;/li&gt;
&lt;li&gt;Desktop control gives it a path into native apps and GUI workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination changes the product from "proxy for AI tools" into "local operator for developer workflows."&lt;/p&gt;

&lt;p&gt;I think the desktop-control layer deserves its own post, because "AI can operate any app through the OS accessibility tree" is a deeper topic than I can fit here.&lt;/p&gt;

&lt;p&gt;The project is open source here: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you handling the boundary between coding agents and the desktop apps they still need to interact with?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>DeepSeek's API Price Cut Changed My Claude Code and ChatGPT Math</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 25 May 2026 08:37:24 +0000</pubDate>
      <link>https://dev.to/codekingai/deepseeks-api-price-cut-changed-my-claude-code-and-chatgpt-math-2fln</link>
      <guid>https://dev.to/codekingai/deepseeks-api-price-cut-changed-my-claude-code-and-chatgpt-math-2fln</guid>
      <description>&lt;p&gt;The DeepSeek API price cut made me rethink a habit I had quietly accepted: choosing an AI coding tool and then living with whatever model economics came with it.&lt;/p&gt;

&lt;p&gt;Claude Code is great when I want a strong terminal-native coding agent. ChatGPT and Codex are great when I want OpenAI's workflow and model stack. But when a provider like DeepSeek suddenly drops API pricing, the obvious question is not just "is this cheap?"&lt;/p&gt;

&lt;p&gt;It is: &lt;strong&gt;can I actually use the cheaper model from the tools I already use?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Price Cut Is The Interesting Part
&lt;/h2&gt;

&lt;p&gt;As of May 25, 2026, DeepSeek's pricing page lists V4 Flash at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.14 per 1M input tokens&lt;/li&gt;
&lt;li&gt;$0.0028 per 1M cached input tokens&lt;/li&gt;
&lt;li&gt;$0.28 per 1M output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also lists V4 Pro at the 75% discounted rate, with a note that after the promotion ends on May 31, 2026, the API price will still be officially adjusted to one-quarter of the original price:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.435 per 1M input tokens&lt;/li&gt;
&lt;li&gt;$0.003625 per 1M cached input tokens&lt;/li&gt;
&lt;li&gt;$0.87 per 1M output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The part that matters for coding agents is cached input. Coding tools resend a lot of repeated context: system prompts, repo summaries, conversation history, tool schemas, and task state. If cache hits are cheap enough, repeated agent loops start looking very different economically.&lt;/p&gt;

&lt;p&gt;I checked the current public pricing pages before writing this: &lt;a href="https://api-docs.deepseek.com/quick_start/pricing/" rel="noopener noreferrer"&gt;DeepSeek API pricing&lt;/a&gt;, &lt;a href="https://claude.com/pricing" rel="noopener noreferrer"&gt;Claude plans&lt;/a&gt;, &lt;a href="https://platform.claude.com/docs/en/about-claude/models/overview" rel="noopener noreferrer"&gt;Claude API models&lt;/a&gt;, &lt;a href="https://chatgpt.com/pricing/" rel="noopener noreferrer"&gt;ChatGPT plans&lt;/a&gt;, and &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI API pricing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That is why this cut is more than a nice model announcement. It changes where I want routine coding traffic to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Comparison I Actually Care About
&lt;/h2&gt;

&lt;p&gt;Claude Code pricing is predictable if you use a subscription: Claude Pro is $20/month when billed monthly, and Max starts at $100/month. On the API side, Anthropic lists Claude Opus 4.7 at $5 input and $25 output per 1M tokens, and Sonnet 4.6 at $3 input and $15 output.&lt;/p&gt;

&lt;p&gt;ChatGPT has the same split. Plus is the familiar $20/month plan, Pro tiers go much higher, and OpenAI API pricing for flagship GPT models is still priced like premium infrastructure. GPT-5.5 is listed at $5 input, $0.50 cached input, and $30 output per 1M tokens.&lt;/p&gt;

&lt;p&gt;Those plans can be worth it. I am not pretending DeepSeek replaces every hard reasoning workload.&lt;/p&gt;

&lt;p&gt;But for coding-agent traffic, the uncomfortable truth is that a lot of tokens are not "hard reasoning" tokens. They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reading files&lt;/li&gt;
&lt;li&gt;rewriting boilerplate&lt;/li&gt;
&lt;li&gt;producing test scaffolds&lt;/li&gt;
&lt;li&gt;formatting docs&lt;/li&gt;
&lt;li&gt;classifying intent&lt;/li&gt;
&lt;li&gt;continuing a known task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of traffic I want to route to a cheaper model first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Annoying Part: Tools Do Not Make This Easy
&lt;/h2&gt;

&lt;p&gt;The problem is that Claude Code, Codex, and ChatGPT-style workflows do not all speak the same protocol.&lt;/p&gt;

&lt;p&gt;Claude Code expects Anthropic-shaped requests.&lt;/p&gt;

&lt;p&gt;Codex expects OpenAI-shaped requests.&lt;/p&gt;

&lt;p&gt;Other tools may expect Gemini-style routes or their own local configuration. So even when DeepSeek exposes low-cost models, the practical setup can still turn into a mess of environment variables, API keys, base URLs, and wrappers.&lt;/p&gt;

&lt;p&gt;That is the gap I built CliGate to fill.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed With CliGate
&lt;/h2&gt;

&lt;p&gt;CliGate is a local AI gateway that runs on &lt;code&gt;localhost&lt;/code&gt;. Instead of pointing every tool directly at a provider, I point the tools at CliGate once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;any-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex can also point at the same local gateway through its OpenAI-compatible configuration.&lt;/p&gt;

&lt;p&gt;From there, CliGate handles the important layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route Claude Code, Codex CLI, Gemini CLI, and web chat through one local control plane&lt;/li&gt;
&lt;li&gt;keep account pools and API keys in the same routing layer&lt;/li&gt;
&lt;li&gt;map model names and app-level routes&lt;/li&gt;
&lt;li&gt;send routine traffic to DeepSeek when cost matters&lt;/li&gt;
&lt;li&gt;keep premium models available for the tasks that actually need them&lt;/li&gt;
&lt;li&gt;show usage, request logs, and cost views in the dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means I do not have to decide "Claude Code or DeepSeek" as a tool choice. I can keep Claude Code as the interface and route some of its traffic through DeepSeek. I can keep Codex as the workflow and still move compatible requests to a cheaper upstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Advantage Is Not Just Cheap Tokens
&lt;/h2&gt;

&lt;p&gt;Cheap tokens help. But the bigger advantage is optionality.&lt;/p&gt;

&lt;p&gt;I want to be able to say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use DeepSeek V4 Flash for cheap routine work&lt;/li&gt;
&lt;li&gt;use DeepSeek V4 Pro when I want stronger low-cost reasoning&lt;/li&gt;
&lt;li&gt;keep Claude for difficult multi-file edits&lt;/li&gt;
&lt;li&gt;keep GPT for workflows where OpenAI's stack is the right fit&lt;/li&gt;
&lt;li&gt;keep local models for private or offline tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a routing layer, that sounds like a spreadsheet and a pile of config files. With a local gateway, it becomes an operations problem: add keys, set routing, inspect usage, adjust when the bill or quality tells you to.&lt;/p&gt;

&lt;p&gt;That is the product advantage I care about. CliGate does not ask me to abandon Claude Code or ChatGPT-style tools. It lets those tools reach low-cost DeepSeek models without changing how I work.&lt;/p&gt;

&lt;h2&gt;
  
  
  My New Default
&lt;/h2&gt;

&lt;p&gt;After this price cut, my default is no longer "pick one premium coding assistant and pay whatever it costs."&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;keep the coding tools I like&lt;/li&gt;
&lt;li&gt;route routine traffic to the cheapest good-enough model&lt;/li&gt;
&lt;li&gt;reserve expensive models for the tasks that justify them&lt;/li&gt;
&lt;li&gt;watch usage and pricing in one place&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That feels like the right shape for AI coding in 2026.&lt;/p&gt;

&lt;p&gt;The models will keep changing. The prices will definitely keep changing. The part I do not want to keep changing is every CLI config on my machine.&lt;/p&gt;

&lt;p&gt;CliGate is here if you want to inspect the implementation: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;https://github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you handling model cost now: one subscription, direct API usage, or routing per task?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Fri, 22 May 2026 11:35:09 +0000</pubDate>
      <link>https://dev.to/codekingai/my-web-chat-wasnt-a-real-channel-that-broke-my-agent-pipeline-11ed</link>
      <guid>https://dev.to/codekingai/my-web-chat-wasnt-a-real-channel-that-broke-my-agent-pipeline-11ed</guid>
      <description>&lt;p&gt;I thought my web chat was the simplest surface in the whole product.&lt;/p&gt;

&lt;p&gt;Telegram, Feishu, and DingTalk were the complicated ones. Web chat was just the dashboard. Same browser, same server, same app. What could possibly go wrong?&lt;/p&gt;

&lt;p&gt;A lot, apparently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug looked random from the UI
&lt;/h2&gt;

&lt;p&gt;A task would start from the web chat UI just fine.&lt;/p&gt;

&lt;p&gt;The runtime session existed. The conversation existed. The task existed. The logs looked healthy enough.&lt;/p&gt;

&lt;p&gt;And then the delivery pipeline tried to send a follow-up update back into the conversation and got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversation_not_found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which made no sense, because the conversation definitely existed. I had just used it.&lt;/p&gt;

&lt;p&gt;This is the kind of bug that wastes time because every individual subsystem looks half-correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: I treated web chat like a page, not a channel
&lt;/h2&gt;

&lt;p&gt;The architecture in &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; already had a channel model.&lt;/p&gt;

&lt;p&gt;Telegram is a channel. Feishu is a channel. DingTalk is a channel.&lt;/p&gt;

&lt;p&gt;Those inbound messages go through the same supervision and delivery machinery:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;conversation store&lt;/li&gt;
&lt;li&gt;scheduler&lt;/li&gt;
&lt;li&gt;delivery sender&lt;/li&gt;
&lt;li&gt;assistant orchestration&lt;/li&gt;
&lt;li&gt;runtime session binding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But web chat had slowly drifted into a special-case path.&lt;/p&gt;

&lt;p&gt;That felt harmless at first. Web chat lived inside the same app, so it was easy to give it a little custom state and a few convenience wrappers.&lt;/p&gt;

&lt;p&gt;That was the mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually broke
&lt;/h2&gt;

&lt;p&gt;The old version of &lt;code&gt;chat-ui/conversation-store.js&lt;/code&gt; exported its own store instance.&lt;/p&gt;

&lt;p&gt;Meanwhile, the delivery and orchestration path used the shared channel conversation store.&lt;/p&gt;

&lt;p&gt;So both sides were reading and writing "conversations," but not the same in-memory array.&lt;/p&gt;

&lt;p&gt;That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the chat UI could create a conversation&lt;/li&gt;
&lt;li&gt;the route handler could see it&lt;/li&gt;
&lt;li&gt;the runtime could bind to it&lt;/li&gt;
&lt;li&gt;but the scheduler could still fail to find it later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comments in the fix say it more plainly than I can:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// chat-ui and agent-channels each held a SEPARATE in-memory&lt;/span&gt;
&lt;span class="c1"&gt;// `conversations` array, even though both wrote to the same JSON file on disk.&lt;/span&gt;
&lt;span class="c1"&gt;// After server start, a chat-ui conversation created at runtime was visible to&lt;/span&gt;
&lt;span class="c1"&gt;// chat-ui-route but NOT to message-service, so scheduler deliveries hit&lt;/span&gt;
&lt;span class="c1"&gt;// `conversation_not_found` and silently dropped notifications.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is exactly the sort of bug you get when a "small UI shortcut" quietly forks your domain model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix was not complicated
&lt;/h2&gt;

&lt;p&gt;I did not need a new abstraction.&lt;/p&gt;

&lt;p&gt;I needed one source of truth.&lt;/p&gt;

&lt;p&gt;Instead of exporting a dedicated chat-ui conversation store in production, I attached chat-specific helpers to the shared singleton used by the channel system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;installChatUiHelpers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentChannelConversationStore&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chatUiConversationStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;agentChannelConversationStore&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one change matters more than it looks.&lt;/p&gt;

&lt;p&gt;Now web chat is not pretending to be adjacent to the channel system. It is part of the channel system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this changed more than delivery
&lt;/h2&gt;

&lt;p&gt;Once I stopped treating web chat as a special page, a lot of other decisions became cleaner.&lt;/p&gt;

&lt;p&gt;A chat-ui conversation now behaves like a real peer of the other channels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it has the same conversation identity model&lt;/li&gt;
&lt;li&gt;it uses the same assistant delivery state&lt;/li&gt;
&lt;li&gt;it flows through the same runtime binding logic&lt;/li&gt;
&lt;li&gt;it can receive scheduler-driven updates without weird bridging code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because a multi-surface assistant only stays sane if all entry points agree on what a conversation is.&lt;/p&gt;

&lt;p&gt;If one surface has its own special rules, you do not have one product anymore. You have one product plus one exception that keeps leaking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The other important fix: seed assistant mode from the start
&lt;/h2&gt;

&lt;p&gt;There was a second detail hidden in the same file.&lt;/p&gt;

&lt;p&gt;New chat-ui conversations now start with assistant control mode already set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assistantCore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildAssistantCoreDeliveryState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;existingAssistantCore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;controlMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CONVERSATION_ASSISTANT_CONTROL_MODE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ASSISTANT&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because web chat should enter the same top-level assistant orchestration path as the messaging channels.&lt;/p&gt;

&lt;p&gt;If the first message from the web UI skips that and goes straight to the bound runtime, you get behavioral drift:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;web chat behaves one way&lt;/li&gt;
&lt;li&gt;Telegram behaves another way&lt;/li&gt;
&lt;li&gt;Feishu behaves another way&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then every bug becomes impossible to reason about because the surfaces are no longer comparable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The test I actually wanted
&lt;/h2&gt;

&lt;p&gt;I have learned to distrust fixes like this unless there is a test that proves the behavioral contract.&lt;/p&gt;

&lt;p&gt;The right question was not "does chat-ui store still work?"&lt;/p&gt;

&lt;p&gt;The right question was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;does a chat-ui conversation participate in assistant behavior like a real channel?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is why the surrounding tests focus on assistant-mode behavior and persisted conversation semantics instead of only checking helper methods in isolation.&lt;/p&gt;

&lt;p&gt;The implementation detail was a store instance bug.&lt;/p&gt;

&lt;p&gt;The product bug was channel inconsistency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from this
&lt;/h2&gt;

&lt;p&gt;When you build multi-channel agent systems, the browser UI is seductive.&lt;/p&gt;

&lt;p&gt;It feels local. It feels simple. It feels close enough to the app that you can justify giving it custom flow control, custom state, or custom routing.&lt;/p&gt;

&lt;p&gt;That instinct is expensive.&lt;/p&gt;

&lt;p&gt;If the browser chat can start tasks, receive async updates, carry conversation identity, and interact with the same supervisor as your mobile or messaging surfaces, then it is not "just a page."&lt;/p&gt;

&lt;p&gt;It is a channel.&lt;/p&gt;

&lt;p&gt;And if you do not model it that way, the architecture will eventually make you pay for the lie.&lt;/p&gt;

&lt;p&gt;Are you treating your web chat as a first-class channel, or as a special case that has not failed loudly yet?&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"My Coding Agent Remembered Sessions, Not Work. That Was the Bug"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 21 May 2026 03:46:07 +0000</pubDate>
      <link>https://dev.to/codekingai/my-coding-agent-remembered-sessions-not-work-that-was-the-bug-1n3e</link>
      <guid>https://dev.to/codekingai/my-coding-agent-remembered-sessions-not-work-that-was-the-bug-1n3e</guid>
      <description>&lt;p&gt;The first version of my coding agent had a very common bug: it remembered the conversation, but not the work.&lt;/p&gt;

&lt;p&gt;That sounds fine until the agent has to do something real.&lt;/p&gt;

&lt;p&gt;I would start a task from the web UI, continue it from a mobile channel, approve one command, ask for progress later, and then discover that the system was mostly guessing from the last few messages. It knew there was a session. It did not really know what job that session belonged to.&lt;/p&gt;

&lt;p&gt;That is the difference between a chatbot and a working assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Was The Unit Of Memory
&lt;/h2&gt;

&lt;p&gt;Most agent systems begin with a simple shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversation -&amp;gt; runtime session -&amp;gt; messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works for demos because the user does one thing at a time.&lt;/p&gt;

&lt;p&gt;It breaks when the user behaves normally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"continue the routing task"&lt;/li&gt;
&lt;li&gt;"use Claude Code to review what Codex just changed"&lt;/li&gt;
&lt;li&gt;"what happened with the thing from yesterday?"&lt;/li&gt;
&lt;li&gt;"retry that, but keep the same working directory"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those are really about a chat session. They are about work.&lt;/p&gt;

&lt;p&gt;A runtime session can crash. A user can switch from web to Telegram or Feishu. Two agents can work on the same issue from different roles. If the system treats the runtime session as the main identity, every one of those cases becomes fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Split Work From Execution
&lt;/h2&gt;

&lt;p&gt;In CliGate, I started moving the design toward a different model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Person
  -&amp;gt; Project
    -&amp;gt; Task
      -&amp;gt; Execution
        -&amp;gt; RuntimeSession
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is not the diagram. It is the boundary.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Task&lt;/strong&gt; is the thing the user thinks they are doing: "fix routing", "review the auth change", "write release notes", "check why the build failed".&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;Execution&lt;/strong&gt; is one concrete attempt to move that task forward. It may be Codex acting as the editor, Claude Code acting as a reviewer, or another provider doing a focused job.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;RuntimeSession&lt;/strong&gt; is just the current process or provider session underneath that execution.&lt;/p&gt;

&lt;p&gt;That means the assistant can say: this is still the same task, even if the runtime process has changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters In Real Use
&lt;/h2&gt;

&lt;p&gt;The most annoying bugs came from follow-ups.&lt;/p&gt;

&lt;p&gt;When I typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;make the button green&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I did not mean "start an unrelated new job." I meant "continue the last task with the same context."&lt;/p&gt;

&lt;p&gt;When I typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;use cc to review it too&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I did not mean "replace the current agent." I meant "spawn a second execution under the same task, with a reviewer role."&lt;/p&gt;

&lt;p&gt;Those two messages look similar if all you have is chat history. They are very different if the system has a task model.&lt;/p&gt;

&lt;p&gt;Once the assistant can distinguish task identity from execution identity, a few things become much easier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;status questions can be answered from task state&lt;/li&gt;
&lt;li&gt;provider preference can follow the work instead of the channel&lt;/li&gt;
&lt;li&gt;a dead runtime can be replaced without pretending the task is new&lt;/li&gt;
&lt;li&gt;multiple agents can collaborate without sharing one messy transcript&lt;/li&gt;
&lt;li&gt;web UI and mobile channels can show different levels of detail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point surprised me. On mobile, I want a short answer: "Codex is waiting for approval." In the web UI, I may want the full timeline: user message, assistant decision, runtime event, command output, file changes, approval, result.&lt;/p&gt;

&lt;p&gt;Same task. Different presentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rule I Wish I Had Started With
&lt;/h2&gt;

&lt;p&gt;If the user can reasonably ask "what happened with that thing?", that thing deserves an identity outside the chat transcript.&lt;/p&gt;

&lt;p&gt;For my project, that identity became &lt;code&gt;Task&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The runtime session is still useful. It preserves provider context and lets the agent resume efficiently. But it should not be the thing the product uses to understand the user's work.&lt;/p&gt;

&lt;p&gt;Sessions are implementation details. Work is the product surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;I am still iterating on the architecture, but the direction already cleaned up several design decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;follow-ups route to tasks, not just the latest session&lt;/li&gt;
&lt;li&gt;retries can keep the same task identity&lt;/li&gt;
&lt;li&gt;reviewer agents can attach to the same task as editor agents&lt;/li&gt;
&lt;li&gt;approvals can be remembered at task or project scope&lt;/li&gt;
&lt;li&gt;channel messages can stay short without losing full traceability in the dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This also made failure handling less awkward. If a runtime dies, the assistant does not need to tell the user "your session is gone, please start over." It can start a new runtime under the same execution or create a fresh execution under the same task, depending on what actually failed.&lt;/p&gt;

&lt;p&gt;That is a small implementation detail with a large UX effect.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;I used to think agent memory meant better summaries of previous messages.&lt;/p&gt;

&lt;p&gt;Now I think the more important question is: what are you summarizing &lt;em&gt;into&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;If everything collapses back into a conversation, the assistant will eventually lose the shape of the work. If the product has explicit projects, tasks, executions, and runtime sessions, the agent has somewhere stable to put its memory.&lt;/p&gt;

&lt;p&gt;That has become one of the design principles behind &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you are building coding agents, how are you modeling the difference between a conversation, a task, and a runtime session?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"My DingTalk Coding Bot Said It Started the Task. Then It Never Sent the Result"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 21 May 2026 03:28:09 +0000</pubDate>
      <link>https://dev.to/codekingai/my-dingtalk-coding-bot-said-it-started-the-task-then-it-never-sent-the-result-gi2</link>
      <guid>https://dev.to/codekingai/my-dingtalk-coding-bot-said-it-started-the-task-then-it-never-sent-the-result-gi2</guid>
      <description>&lt;p&gt;The most annoying mobile-agent failure is not a crash.&lt;/p&gt;

&lt;p&gt;It is the fake success message.&lt;/p&gt;

&lt;p&gt;You send a task from DingTalk. The bot replies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task accepted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then Claude Code or Codex actually runs for a while, finishes the work, and nothing comes back to the phone.&lt;/p&gt;

&lt;p&gt;That is worse than an immediate error. It makes you think the agent is still working, when the real problem is that the result fell out of the delivery path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I have been building &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;, a local AI gateway for Claude Code, Codex CLI, Gemini CLI, dashboard chat, and mobile channels.&lt;/p&gt;

&lt;p&gt;The mobile-channel idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;send a task from DingTalk&lt;/li&gt;
&lt;li&gt;route it to Claude Code or Codex on my machine&lt;/li&gt;
&lt;li&gt;keep the runtime session attached to that DingTalk conversation&lt;/li&gt;
&lt;li&gt;send approvals, questions, progress, and final results back to the same chat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first part worked.&lt;/p&gt;

&lt;p&gt;DingTalk could trigger the runtime.&lt;/p&gt;

&lt;p&gt;The broken part was the final callback.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug: I was replying to the assistant run, not the runtime
&lt;/h2&gt;

&lt;p&gt;The channel layer used to behave too much like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;inbound message
  -&amp;gt; assistant run
  -&amp;gt; immediate assistant reply
  -&amp;gt; send message back to DingTalk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds fine until the assistant delegates to a long-running runtime.&lt;/p&gt;

&lt;p&gt;In that case, the useful result is not the immediate assistant text. The useful result is the runtime terminal event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runtime completed
runtime failed
runtime asks a question
runtime asks for approval
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My old logic only waited for the final runtime result in a narrow multi-session case. If one assistant run produced multiple runtime sessions, it would fan in and wait. But the common path is just one delegated runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cligate ask Claude Code to fix this bug
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That produced one runtime session, so the channel often got the "started" message and missed the real result.&lt;/p&gt;

&lt;p&gt;The fix was small but important:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;shouldDeferBackgroundCallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sessionIds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;assistantRun&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;relatedRuntimeSessionIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantRun&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;relatedRuntimeSessionIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;assistantRun&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;ASSISTANT_RUN_STATUS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WAITING_RUNTIME&lt;/span&gt;
    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;sessionIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The old mental model was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;only wait when there are multiple runtime sessions&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The correct model is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;if the assistant delegated to any runtime, wait for that runtime result before treating the channel reply as complete&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That change made single-session mobile tasks behave like real tasks instead of fire-and-forget acknowledgements.&lt;/p&gt;

&lt;h2&gt;
  
  
  The second bug: DingTalk's session webhook can lie by omission
&lt;/h2&gt;

&lt;p&gt;DingTalk gives you a &lt;code&gt;sessionWebhook&lt;/code&gt; for replying inside the inbound interaction window.&lt;/p&gt;

&lt;p&gt;So the obvious implementation is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if sessionWebhook exists and has not expired:
  send through sessionWebhook
else:
  send through App API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is what I started with.&lt;/p&gt;

&lt;p&gt;The problem is that the timestamp is not the whole truth. A session webhook can still look fresh locally while DingTalk rejects it server-side because the session was consumed or closed.&lt;/p&gt;

&lt;p&gt;So this code was too optimistic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionWebhook&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;expiredAt&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;expiredAt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendViaSessionWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionWebhook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that send failed, the whole delivery failed.&lt;/p&gt;

&lt;p&gt;The fix was to treat session webhook as the cheap first attempt, not the only attempt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionWebhook&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;expiredAt&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;expiredAt&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendViaSessionWebhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionWebhook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// fall through to App API&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the provider falls back to the DingTalk App API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;textChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendViaAppApi&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;externalConversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;robotCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;channelContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;robotCode&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;conversationType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;channelContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conversationType&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;senderStaffId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;channelContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;senderStaffId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That made delivery much more reliable.&lt;/p&gt;

&lt;p&gt;The important lesson: a webhook expiry timestamp is not a delivery guarantee.&lt;/p&gt;

&lt;h2&gt;
  
  
  The third bug was hidden in the registry
&lt;/h2&gt;

&lt;p&gt;This one was more subtle.&lt;/p&gt;

&lt;p&gt;CliGate supports channel provider instances. The raw provider template in the registry is not the same thing as a started provider instance.&lt;/p&gt;

&lt;p&gt;The started instance has settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;clientId&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;clientSecret&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;robotCode&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;mode&lt;/li&gt;
&lt;li&gt;runtime defaults&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The raw template does not.&lt;/p&gt;

&lt;p&gt;That matters because DingTalk App API fallback needs credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clientId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chooseSetting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;clientId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;appKey&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;clientSecret&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chooseSetting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;clientSecret&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;appSecret&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the outbound delivery sender asks the raw registry for &lt;code&gt;dingtalk&lt;/code&gt;, it may get a provider object with no settings. Then the session webhook fails, the App API fallback starts, and the fallback has no credentials.&lt;/p&gt;

&lt;p&gt;So the channel manager now injects an instance-aware registry shim into both the dispatcher and the delivery sender:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;instanceAwareRegistry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;instanceId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getInstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;instanceId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outboundDispatcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;registry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;instanceAwareRegistry&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;outboundDispatcher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deliverySender&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;setRegistry&lt;/span&gt;&lt;span class="p"&gt;?.(&lt;/span&gt;&lt;span class="nx"&gt;instanceAwareRegistry&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second line is the one that matters.&lt;/p&gt;

&lt;p&gt;It is easy to update the dispatcher and forget that the actual send path lives one object deeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runtime events now drive outbound delivery
&lt;/h2&gt;

&lt;p&gt;The architecture I trust more is event-based:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runtime event
  -&amp;gt; find channel conversations tracking that runtime session
  -&amp;gt; format event for the channel
  -&amp;gt; arbitrate whether to send now or suppress
  -&amp;gt; send through provider instance
  -&amp;gt; record delivery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dispatcher listens to runtime session events:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;unsubscribe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;runtimeSessionManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eventBus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribeAll&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleRuntimeEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then it finds conversations tracking that runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conversations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conversationStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listByTrackedRuntimeSessionId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And sends through the delivery sender:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deliverySender&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latestConversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latestConversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;eventSeq&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fullText&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;buttons&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buttons&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;event&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the boundary I wanted.&lt;/p&gt;

&lt;p&gt;The assistant may start the work, but the runtime event owns the runtime result.&lt;/p&gt;

&lt;h2&gt;
  
  
  I added tests for the boring parts
&lt;/h2&gt;

&lt;p&gt;The boring parts are where channel bugs usually hide.&lt;/p&gt;

&lt;p&gt;There is a test for DingTalk falling back to the App API when the session webhook is unavailable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sr"&gt;/oauth2&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;accessToken/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sr"&gt;/robot&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;oToMessages&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;batchSend/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deepEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userIds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;staff_123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;robotCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;robot_123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is also coverage for group conversation fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sr"&gt;/robot&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;groupMessages&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;send/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the delivery sender records sent and suppressed deliveries into the assistant event ledger, so debugging does not depend on guessing whether the provider was called.&lt;/p&gt;

&lt;p&gt;That is what I want for mobile agents: not just "send a message", but an auditable delivery path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow after the fix
&lt;/h2&gt;

&lt;p&gt;The flow I wanted now looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DingTalk message comes in.&lt;/li&gt;
&lt;li&gt;CliGate routes it to the assistant or direct runtime path.&lt;/li&gt;
&lt;li&gt;Claude Code or Codex starts a runtime session.&lt;/li&gt;
&lt;li&gt;The DingTalk thread tracks that runtime session.&lt;/li&gt;
&lt;li&gt;Runtime terminal events trigger outbound delivery.&lt;/li&gt;
&lt;li&gt;DingTalk session webhook is tried first when useful.&lt;/li&gt;
&lt;li&gt;If that fails, App API fallback sends the result.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The user sees the thing that matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code: fixed the failing test and updated the route handler.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;not just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task accepted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;Mobile coding agents need stronger delivery semantics than chat demos.&lt;/p&gt;

&lt;p&gt;It is not enough to prove that the bot can receive a message. It has to survive the whole lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accepted&lt;/li&gt;
&lt;li&gt;started&lt;/li&gt;
&lt;li&gt;waiting for approval&lt;/li&gt;
&lt;li&gt;waiting for user input&lt;/li&gt;
&lt;li&gt;completed&lt;/li&gt;
&lt;li&gt;failed&lt;/li&gt;
&lt;li&gt;delivered&lt;/li&gt;
&lt;li&gt;suppressed with a reason&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And if the channel has multiple send paths, the code has to treat the first path as an optimization, not the truth.&lt;/p&gt;

&lt;p&gt;For DingTalk, that meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;do not trust &lt;code&gt;sessionWebhook&lt;/code&gt; freshness too much&lt;/li&gt;
&lt;li&gt;fall back to App API when webhook send fails&lt;/li&gt;
&lt;li&gt;make sure the sender uses the started provider instance, not the raw provider template&lt;/li&gt;
&lt;li&gt;wait for runtime results even when there is only one runtime session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not the flashy part of building an AI coding agent.&lt;/p&gt;

&lt;p&gt;But it is the part that decides whether you can actually trust it from your phone.&lt;/p&gt;

&lt;p&gt;If you want to inspect the implementation, the project is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am curious how other people are handling mobile agent delivery. Do you send one "task accepted" message, or do you wire final runtime events back into the original chat thread?&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>webdev</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>"I Stopped Choosing Between Claude Code and Codex. I Put Both in One Chat Window"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Wed, 20 May 2026 06:19:34 +0000</pubDate>
      <link>https://dev.to/codekingai/i-stopped-choosing-between-claude-code-and-codex-i-put-both-in-one-chat-window-2jpp</link>
      <guid>https://dev.to/codekingai/i-stopped-choosing-between-claude-code-and-codex-i-put-both-in-one-chat-window-2jpp</guid>
      <description>&lt;p&gt;Every "Claude Code vs Codex" comparison eventually runs into the same boring truth:&lt;/p&gt;

&lt;p&gt;I do not want to pick one forever.&lt;/p&gt;

&lt;p&gt;Some tasks feel better in Claude Code. Some feel better in Codex. Some days one account is rate-limited, one model is cheaper, or one runtime is already holding the context I need.&lt;/p&gt;

&lt;p&gt;The annoying part is not choosing the better agent.&lt;/p&gt;

&lt;p&gt;The annoying part is switching surfaces every time I change my mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow I wanted
&lt;/h2&gt;

&lt;p&gt;I wanted one local chat window where I could do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use Codex for this task.
Continue that same runtime.
Switch to Claude Code for the next one.
Ask the assistant to plan first.
Go back to direct runtime mode.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds like a UI problem, but it is really a control problem.&lt;/p&gt;

&lt;p&gt;There are two different things happening:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;direct runtime work, where the next message should go straight to Claude Code or Codex&lt;/li&gt;
&lt;li&gt;assistant-mediated work, where a supervisor decides whether to answer, ask a question, or delegate to a runtime&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If those two modes are not explicit, the chat window turns into a trap. A short follow-up like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;make it smaller
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;can either mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;continue the active Codex runtime&lt;/li&gt;
&lt;li&gt;ask the product assistant&lt;/li&gt;
&lt;li&gt;start a new Claude Code task&lt;/li&gt;
&lt;li&gt;answer a pending approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Guessing wrong here is exactly how coding agents become frustrating.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I made direct runtime the default
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;, the chat UI conversation now defaults to direct runtime mode.&lt;/p&gt;

&lt;p&gt;That was a deliberate choice.&lt;/p&gt;

&lt;p&gt;Most of the time, when I am using a coding agent, I do not want an assistant to intercept every message and "think about what I meant." I want the current runtime to continue until I explicitly ask for something else.&lt;/p&gt;

&lt;p&gt;There is a test that pins this behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ChatUiConversationStore defaults new chat-ui conversations to direct-runtime control mode&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;conversationStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findOrCreateBySessionId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat-ui-default-direct-runtime-1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;assistantCore&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;direct-runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;assistantCore&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;controlMode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;direct-runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means a normal chat message does not automatically become "assistant work." It stays on the runtime path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two commands that made the UI usable
&lt;/h2&gt;

&lt;p&gt;I ended up with a small mode switch instead of another complicated settings panel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cligate
/runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mode parser is intentionally tiny:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cligateMatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;cligate&lt;/span&gt;&lt;span class="se"&gt;(?:\s&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;.+&lt;/span&gt;&lt;span class="se"&gt;))?&lt;/span&gt;&lt;span class="sr"&gt;$/i&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cligateMatch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cligate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cligateMatch&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;runtime$/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;trimmed&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The behavior is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/cligate&lt;/code&gt; enters assistant mode&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cligate &amp;lt;task&amp;gt;&lt;/code&gt; runs one assistant-mediated task&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/runtime&lt;/code&gt; exits assistant mode and returns to direct runtime routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That one escape hatch matters.&lt;/p&gt;

&lt;p&gt;When I am done asking the assistant to plan or coordinate, I want the next message to go back to the active Claude Code or Codex session without ceremony.&lt;/p&gt;

&lt;h2&gt;
  
  
  The route now decides before touching the runtime
&lt;/h2&gt;

&lt;p&gt;The chat route first gives the assistant mode service a chance to handle the message.&lt;/p&gt;

&lt;p&gt;If assistant mode is not active and there is no &lt;code&gt;/cligate&lt;/code&gt; command, it returns &lt;code&gt;null&lt;/code&gt;, and the message goes down the normal runtime path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;assistantResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantModeService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;maybeHandleMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;defaultRuntimeProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;executionMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;assistantExecutionMode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;onBackgroundResult&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;assistantResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;assistantResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;previousSessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;activeRuntimeSessionId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;assistantResult&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;conversationStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only after that does the service route directly to the runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messageService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;routeUserMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;defaultRuntimeProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;assistantMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getAssistantControlMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat-ui&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sessionId&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That separation is the whole point.&lt;/p&gt;

&lt;p&gt;The assistant does not get to hijack direct runtime messages just because it exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is better than a "smart" default
&lt;/h2&gt;

&lt;p&gt;I tried to make the assistant helpful.&lt;/p&gt;

&lt;p&gt;Then I realized "helpful" is dangerous in a coding workflow.&lt;/p&gt;

&lt;p&gt;If a runtime is waiting for input, the least surprising thing is to send input to that runtime. If a task has a pending approval, the least surprising thing is to resolve that approval. If the user explicitly types &lt;code&gt;/cligate&lt;/code&gt;, then the assistant can step in.&lt;/p&gt;

&lt;p&gt;The result feels less magical, but much easier to trust.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fix the failing unit test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;can start a Codex runtime.&lt;/p&gt;

&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try the simpler patch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;continues that runtime.&lt;/p&gt;

&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cligate compare this failure with the last run before continuing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;lets the assistant reason over the situation.&lt;/p&gt;

&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;puts the conversation back on the direct runtime path.&lt;/p&gt;

&lt;p&gt;That is the loop I wanted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background runs needed their own guardrail
&lt;/h2&gt;

&lt;p&gt;The other bug showed up after I made assistant runs asynchronous in the chat UI.&lt;/p&gt;

&lt;p&gt;If an assistant-mediated task starts a runtime and returns later, the UI needs to persist the background result. But it must not append stale output from an older assistant run after the user has already started a newer one.&lt;/p&gt;

&lt;p&gt;So the route records the pending assistant run ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;assistant_run_accepted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;assistantRun&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;chatUiConversationStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{}),&lt;/span&gt;
      &lt;span class="na"&gt;uiChatPendingAssistantRunId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantRun&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the background callback refuses stale results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;getPendingUiAssistantRunId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;backgroundRunId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not glamorous, but it prevents a very real UI bug:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;start an assistant task&lt;/li&gt;
&lt;li&gt;start another task before the first one finishes&lt;/li&gt;
&lt;li&gt;watch the old answer appear under the new task&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No thanks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model override bug was another small footgun
&lt;/h2&gt;

&lt;p&gt;There was one more detail that mattered for a mixed Claude Code / Codex chat surface.&lt;/p&gt;

&lt;p&gt;The normal chat UI has a local model selector. Runtime routing has its own provider semantics. If I let the local chat model override leak into runtime routing, I could accidentally send something like &lt;code&gt;gpt-5.4&lt;/code&gt; into a Claude Code runtime path where that was not the user's intent.&lt;/p&gt;

&lt;p&gt;So for local chat-ui runtime messages, the route deliberately ignores the UI chat model override:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runtimeModelOverride&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;isExternalConversation&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is a test for that too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;captured&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;captured&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;defaultRuntimeProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tiny rule saved the UI from pretending that "selected chat model" and "runtime provider" are the same concept.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the setup looks like
&lt;/h2&gt;

&lt;p&gt;Start CliGate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open the dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use the Chat page as the control surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose Codex or Claude Code as the runtime provider&lt;/li&gt;
&lt;li&gt;send a normal task to start direct runtime work&lt;/li&gt;
&lt;li&gt;keep sending follow-ups to continue that runtime&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;/cligate&lt;/code&gt; when you want assistant-mediated planning or delegation&lt;/li&gt;
&lt;li&gt;use &lt;code&gt;/runtime&lt;/code&gt; to return to the direct runtime path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the workflow I wanted from the beginning.&lt;/p&gt;

&lt;p&gt;Not "which terminal agent wins?"&lt;/p&gt;

&lt;p&gt;More like:&lt;/p&gt;

&lt;p&gt;"Can I keep both available without rebuilding my workflow around either one?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson
&lt;/h2&gt;

&lt;p&gt;The current wave of AI coding tools makes comparisons tempting.&lt;/p&gt;

&lt;p&gt;Claude Code vs Codex. Codex vs Gemini CLI. Terminal agent vs IDE agent.&lt;/p&gt;

&lt;p&gt;Those comparisons are useful, but they miss the day-to-day problem:&lt;/p&gt;

&lt;p&gt;developers do not just choose tools. They move between them.&lt;/p&gt;

&lt;p&gt;For me, the useful abstraction was not a smarter chatbot.&lt;/p&gt;

&lt;p&gt;It was a chat control surface with explicit ownership:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;direct runtime by default&lt;/li&gt;
&lt;li&gt;assistant mode only when requested&lt;/li&gt;
&lt;li&gt;sticky runtime continuation&lt;/li&gt;
&lt;li&gt;stale background result protection&lt;/li&gt;
&lt;li&gt;no accidental model override leaking into runtime work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made Claude Code and Codex feel less like competing terminals and more like two workers behind the same local desk.&lt;/p&gt;

&lt;p&gt;If you want to inspect the implementation, the project is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm curious how other people are handling this. Are you choosing one coding agent, or are you building a workflow that lets several of them coexist?&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>webdev</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>"I Got Tired of Rewriting 4 AI CLI Config Files. So I Put Setup Behind One Button"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 18 May 2026 06:15:12 +0000</pubDate>
      <link>https://dev.to/codekingai/i-got-tired-of-rewriting-4-ai-cli-config-files-so-i-put-setup-behind-one-button-30a</link>
      <guid>https://dev.to/codekingai/i-got-tired-of-rewriting-4-ai-cli-config-files-so-i-put-setup-behind-one-button-30a</guid>
      <description>&lt;p&gt;I like trying new AI coding tools.&lt;/p&gt;

&lt;p&gt;I do not like reconfiguring them.&lt;/p&gt;

&lt;p&gt;That was the part that kept getting old: every new CLI came with a different config file, different base URL setting, and a different way to point it at my local gateway.&lt;/p&gt;

&lt;p&gt;After doing this a few too many times, I added a small feature to &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;: a dashboard page that can install and configure the tools for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The annoying part
&lt;/h2&gt;

&lt;p&gt;The tools are similar, but the setup is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code wants env vars&lt;/li&gt;
&lt;li&gt;Codex wants &lt;code&gt;~/.codex/config.toml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Gemini CLI has its own proxy setup path&lt;/li&gt;
&lt;li&gt;OpenClaw wants a JSON provider config&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means one simple goal:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Point all of them at &lt;code&gt;localhost:8081&lt;/code&gt;"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;turns into four different setup chores.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I changed
&lt;/h2&gt;

&lt;p&gt;CliGate now has a &lt;code&gt;Tools&lt;/code&gt; page that does two jobs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;detect whether Node.js and the CLI tools are installed&lt;/li&gt;
&lt;li&gt;write the proxy config for each tool from the same UI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The code behind it is pretty direct.&lt;/p&gt;

&lt;p&gt;The installer knows the official npm package for each tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;codex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Codex CLI&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;codex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;npmPackage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@openai/codex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and the dashboard exposes actions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/codex/config/proxy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/tools/install/codex&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So instead of editing files manually, I can open the panel, click install if a tool is missing, then click configure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I wanted most
&lt;/h2&gt;

&lt;p&gt;I did not want a generic "tool manager."&lt;/p&gt;

&lt;p&gt;I wanted a very specific workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;install Claude Code, Codex, Gemini CLI, or OpenClaw&lt;/li&gt;
&lt;li&gt;point each one at the same local gateway&lt;/li&gt;
&lt;li&gt;launch the tool without leaving the dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what the page does now.&lt;/p&gt;

&lt;p&gt;For example, the Codex side boils down to these settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;chatgpt_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8081/backend-api/"&lt;/span&gt;
&lt;span class="py"&gt;openai_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8081"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code gets its localhost base URL, Gemini CLI gets patched for proxy mode, and OpenClaw gets its provider block written with the same target.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is better than another README section
&lt;/h2&gt;

&lt;p&gt;I already had setup docs.&lt;/p&gt;

&lt;p&gt;The problem was not missing information. The problem was repetition.&lt;/p&gt;

&lt;p&gt;Every time I switched machines, reset a config, or wanted to try another CLI, I was doing the same boring setup work again.&lt;/p&gt;

&lt;p&gt;Once I moved that into the product, the project became easier to try and easier to keep using.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is actually for
&lt;/h2&gt;

&lt;p&gt;If you use one tool with one API key, this is probably unnecessary.&lt;/p&gt;

&lt;p&gt;If you keep bouncing between Claude Code, Codex, Gemini CLI, or OpenClaw, the friction adds up fast. That is the use case this page fixes.&lt;/p&gt;

&lt;p&gt;If you've built a similar setup layer for AI tooling, I'm curious what you automated first: install, auth, config, or routing.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>webdev</category>
      <category>node</category>
      <category>ai</category>
    </item>
    <item>
      <title>"My Product Assistant Kept Borrowing the Wrong Model. So I Gave It Its Own Routing Chain"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Fri, 15 May 2026 06:25:53 +0000</pubDate>
      <link>https://dev.to/codekingai/my-product-assistant-kept-borrowing-the-wrong-model-so-i-gave-it-its-own-routing-chain-2n4h</link>
      <guid>https://dev.to/codekingai/my-product-assistant-kept-borrowing-the-wrong-model-so-i-gave-it-its-own-routing-chain-2n4h</guid>
      <description>&lt;p&gt;I do not mind a product assistant being wrong because the docs are unclear.&lt;/p&gt;

&lt;p&gt;I do mind it being wrong because it silently used the wrong model source.&lt;/p&gt;

&lt;p&gt;That was the real problem I hit in my local AI gateway project, &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The assistant inside the dashboard had a clear job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;answer product-usage questions&lt;/li&gt;
&lt;li&gt;stay grounded in the manual&lt;/li&gt;
&lt;li&gt;avoid rewriting settings unless the user explicitly asks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the runtime path behind that assistant was still too fuzzy. In practice, it could depend on whichever account or API key the broader system happened to resolve first.&lt;/p&gt;

&lt;p&gt;That is fine for generic chat.&lt;/p&gt;

&lt;p&gt;It is not fine for a product assistant that is supposed to be predictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode was subtle
&lt;/h2&gt;

&lt;p&gt;I already had routing. I already had accounts, API keys, and model mapping. I already had a settings surface.&lt;/p&gt;

&lt;p&gt;The annoying part was that the assistant itself still behaved too much like "just another consumer of the default pool."&lt;/p&gt;

&lt;p&gt;That created a few bad outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the assistant could drift across providers without the user realizing it&lt;/li&gt;
&lt;li&gt;clearing a binding could get undone by old migration behavior&lt;/li&gt;
&lt;li&gt;one flaky credential could make the whole assistant feel unreliable&lt;/li&gt;
&lt;li&gt;the UI could not answer a simple question like: what is the assistant actually bound to right now?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bug was not one broken request.&lt;/p&gt;

&lt;p&gt;The bug was that the assistant did not have a first-class routing identity.&lt;/p&gt;

&lt;h2&gt;
  
  
  I stopped thinking in terms of "credential" and switched to "model source"
&lt;/h2&gt;

&lt;p&gt;This is the design change that made the rest of the work much easier.&lt;/p&gt;

&lt;p&gt;I did not actually want to bind the assistant to a vague source type like "OpenAI keys" or "Claude account."&lt;/p&gt;

&lt;p&gt;I wanted to bind it to a concrete model source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api-key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"key_x"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-5.4"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is why the new config path in CliGate moved toward &lt;code&gt;boundModelSource&lt;/code&gt; instead of treating everything as a loose &lt;code&gt;boundCredential&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The internal runtime config now normalizes around that field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundCredential&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nx"&gt;boundCredential&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundCredential&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="nx"&gt;fallbacks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fallbacks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fallbacks&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compatibility alias still exists, but the meaning changed. The assistant is no longer just "attached to a credential." It is attached to a specific source plus an optional model.&lt;/p&gt;

&lt;p&gt;That sounds like a naming cleanup. It was actually a control cleanup.&lt;/p&gt;

&lt;h2&gt;
  
  
  I also needed a way to say "yes, the user configured this on purpose"
&lt;/h2&gt;

&lt;p&gt;One of the uglier problems was legacy migration.&lt;/p&gt;

&lt;p&gt;Older assistant settings had source toggles. Newer settings have explicit bindings. If the user cleared the binding, I did not want old migration logic to recreate it on the next restart just because a legacy flag still existed somewhere.&lt;/p&gt;

&lt;p&gt;So I added a small but important flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"bindingConfigured"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That flag means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the user has explicitly configured assistant binding state&lt;/li&gt;
&lt;li&gt;even if the current binding is &lt;code&gt;null&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;do not auto-migrate old sources back into place&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was one of those changes that looks boring in a diff and saves a lot of operator confusion later.&lt;/p&gt;

&lt;p&gt;Without it, "clear binding" is not a real action. It is just a temporary suggestion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The assistant needed an ordered chain, not one brittle primary
&lt;/h2&gt;

&lt;p&gt;Once the assistant had a proper primary binding, the next obvious problem showed up:&lt;/p&gt;

&lt;p&gt;what happens when that source is deleted, disabled, rate-limited, or just temporarily broken?&lt;/p&gt;

&lt;p&gt;I did not want the answer to be:&lt;/p&gt;

&lt;p&gt;"assistant is down."&lt;/p&gt;

&lt;p&gt;So the assistant runtime now builds a real chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundCredential&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundCredential&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fallbacks&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fallbacks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is simple on purpose.&lt;/p&gt;

&lt;p&gt;The first tier is the assistant's intended home. The later tiers are not magic discovery. They are explicit ordered fallbacks the user can inspect in the UI.&lt;/p&gt;

&lt;p&gt;That matters because fallback behavior should be explainable.&lt;/p&gt;

&lt;p&gt;If an assistant changes models under pressure, I want to know exactly why.&lt;/p&gt;

&lt;h2&gt;
  
  
  A circuit breaker made the assistant feel much less random
&lt;/h2&gt;

&lt;p&gt;Fallback chains are not enough if you keep retrying a dead tier over and over.&lt;/p&gt;

&lt;p&gt;So the assistant LLM client keeps breaker state per tier and skips sources that are currently in cooldown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;descriptor&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tierKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tierKeyFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shouldSkip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tierKey&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;candidate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;resolveCredential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;descriptor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;defaultChatGptModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultChatGptModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;defaultClaudeModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultClaudeModel&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tierKey&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And when a call fails, the tier records failure instead of pretending the error was just bad luck:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;breakerState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_breaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recordFailure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tierKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`[Supervisor] tier failed | tier=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tierKey&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; | breaker=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;breakerState&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That changed the experience more than I expected.&lt;/p&gt;

&lt;p&gt;Before, the assistant could feel inconsistent in a way users interpret as "the prompt changed" or "the model got weird."&lt;/p&gt;

&lt;p&gt;After this change, the behavior became much more operational:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;try the primary source&lt;/li&gt;
&lt;li&gt;skip tripped tiers&lt;/li&gt;
&lt;li&gt;fall through to explicit backups&lt;/li&gt;
&lt;li&gt;expose the health state in the dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a better failure story for a product surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  The UI finally has something honest to show
&lt;/h2&gt;

&lt;p&gt;This was another reason I wanted the routing chain to be explicit.&lt;/p&gt;

&lt;p&gt;Once the backend exposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the current primary&lt;/li&gt;
&lt;li&gt;ordered fallbacks&lt;/li&gt;
&lt;li&gt;resolved source&lt;/li&gt;
&lt;li&gt;breaker state&lt;/li&gt;
&lt;li&gt;last used tier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the settings page can stop being a dead form and start being an inspection tool.&lt;/p&gt;

&lt;p&gt;The assistant page now has controls for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;primary model source&lt;/li&gt;
&lt;li&gt;per-tier model selection&lt;/li&gt;
&lt;li&gt;up to three fallbacks&lt;/li&gt;
&lt;li&gt;breaker threshold and cooldown&lt;/li&gt;
&lt;li&gt;test-binding checks&lt;/li&gt;
&lt;li&gt;tier health and last-used status&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of visibility I wanted when debugging "why did the assistant answer from this provider instead of that one?"&lt;/p&gt;

&lt;h2&gt;
  
  
  I did not want the assistant to silently test with live requests
&lt;/h2&gt;

&lt;p&gt;There is a small route detail here that I like because it keeps the UI honest.&lt;/p&gt;

&lt;p&gt;The binding test endpoint validates whether a descriptor resolves, but it does not fire an actual LLM request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;describeBinding&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the user gets a fast answer to:&lt;/p&gt;

&lt;p&gt;"is this binding even real?"&lt;/p&gt;

&lt;p&gt;without turning the settings screen into an accidental prompt runner.&lt;/p&gt;

&lt;p&gt;It is a small boundary, but product assistants need that kind of boundary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I trust most is the migration and route coverage
&lt;/h2&gt;

&lt;p&gt;I can write all the assistant architecture docs I want, but the thing that makes me trust this change is the route-level test coverage.&lt;/p&gt;

&lt;p&gt;For example, there are tests that pin the new primary field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deepEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;key-primary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-5.4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And tests that make sure clearing bindings is respected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundModelSource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;assert&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;_body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantAgent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;boundCredential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those are the kinds of tests that prevent a future "helpful migration" from quietly breaking the operator's intent again.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in how I think about product assistants
&lt;/h2&gt;

&lt;p&gt;I used to think the important part was the prompt and the docs grounding.&lt;/p&gt;

&lt;p&gt;Those matter.&lt;/p&gt;

&lt;p&gt;But once the assistant becomes part of the product, routing discipline matters just as much.&lt;/p&gt;

&lt;p&gt;If the assistant is meant to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predictable&lt;/li&gt;
&lt;li&gt;inspectable&lt;/li&gt;
&lt;li&gt;recoverable&lt;/li&gt;
&lt;li&gt;configurable without guesswork&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then it cannot just borrow whatever account or API key happened to win a broader routing race.&lt;/p&gt;

&lt;p&gt;It needs its own routing chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern I would reuse
&lt;/h2&gt;

&lt;p&gt;If you are adding a product assistant to an existing app with multiple model sources, I think this is the safer progression:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;give the assistant its own explicit primary binding&lt;/li&gt;
&lt;li&gt;bind to a concrete source plus model, not just a source type&lt;/li&gt;
&lt;li&gt;mark explicit user configuration so legacy migration cannot override it&lt;/li&gt;
&lt;li&gt;add ordered fallbacks&lt;/li&gt;
&lt;li&gt;add breaker state so failures do not loop forever&lt;/li&gt;
&lt;li&gt;expose the whole chain in the UI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is a lot less glamorous than "ship an assistant."&lt;/p&gt;

&lt;p&gt;But it is the difference between a demo assistant and one that operators can actually live with.&lt;/p&gt;

&lt;p&gt;If you want to inspect the implementation, the project is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I am curious how other people are handling this. Does your product assistant have its own routing identity, or is it still borrowing the same model path as ordinary chat?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"I Stopped Letting My AI Assistant Hijack Every Message"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 14 May 2026 15:33:06 +0000</pubDate>
      <link>https://dev.to/codekingai/i-stopped-letting-my-ai-assistant-hijack-every-message-2hbf</link>
      <guid>https://dev.to/codekingai/i-stopped-letting-my-ai-assistant-hijack-every-message-2hbf</guid>
      <description>&lt;p&gt;I kept running into the same problem while building AI tooling: the smarter the assistant looked, the less predictable the product felt.&lt;/p&gt;

&lt;p&gt;You send a message because you want to continue the current coding session. The system decides you probably meant "start a new task," rewrites the intent, and suddenly you are no longer talking to the runtime you thought you were using.&lt;/p&gt;

&lt;p&gt;That sounds small until you try to use it every day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem was not model quality
&lt;/h2&gt;

&lt;p&gt;The failure mode had very little to do with whether the underlying executor was Codex or Claude Code.&lt;/p&gt;

&lt;p&gt;The real problem was control.&lt;/p&gt;

&lt;p&gt;In a coding workflow, there are at least two very different intents:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I want to keep talking to the current runtime session.&lt;/li&gt;
&lt;li&gt;I want a higher-level assistant to look at the whole situation, choose what to do, and coordinate work for me.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If those two paths share the same default entry point, the product starts guessing too much.&lt;/p&gt;

&lt;p&gt;That guess is expensive. It changes session continuity, interrupts the mental model, and makes users wonder whether the system is actually listening or just pattern-matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we changed in CliGate
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; is our local AI gateway for Claude Code, Codex CLI, Gemini CLI, OpenClaw, web chat, and channel-based workflows.&lt;/p&gt;

&lt;p&gt;Instead of treating "assistant" as the universal default, we split the interaction model into two explicit modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Direct Runtime&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Assistant Collaboration&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds like a UI detail, but it changed the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Direct Runtime: boring on purpose
&lt;/h2&gt;

&lt;p&gt;In direct runtime mode, the rule is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Your message goes to the current runtime path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No intent interception. No surprise supervision layer. No "maybe I should help by doing something else first."&lt;/p&gt;

&lt;p&gt;That path matters because stable tooling feels boring in the best way. If a user is already inside an active Codex or Claude Code session, the next message should continue that session unless they clearly ask for something different.&lt;/p&gt;

&lt;p&gt;In our code, that distinction is enforced before the regular routing path kicks in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;assistantResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assistantModeService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;maybeHandleMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;defaultRuntimeProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;assistantResult&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;assistantResult&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messageService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;routeUserMessage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;defaultRuntimeProvider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If assistant mode is not active, the message falls through to the runtime path directly. That one decision removed a lot of ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assistant Collaboration: explicit supervision
&lt;/h2&gt;

&lt;p&gt;The assistant path is still useful. It just should not impersonate the runtime path.&lt;/p&gt;

&lt;p&gt;When users explicitly invoke &lt;code&gt;CliGate Assistant&lt;/code&gt;, they are asking for a different kind of help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inspect the current state&lt;/li&gt;
&lt;li&gt;decide whether to reuse an existing session or start a new one&lt;/li&gt;
&lt;li&gt;choose Codex or Claude Code&lt;/li&gt;
&lt;li&gt;track approvals, pending questions, failures, and completion&lt;/li&gt;
&lt;li&gt;summarize the result back in one reply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a supervisor role, not a terminal role.&lt;/p&gt;

&lt;p&gt;The mental model we landed on looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User
  -&amp;gt; CliGate Assistant
    -&amp;gt; delegate to Codex / Claude Code
      -&amp;gt; executor does the concrete work
        -&amp;gt; assistant returns the synthesized result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we accepted that boundary, several design decisions became much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why mixing them felt wrong
&lt;/h2&gt;

&lt;p&gt;Before this split, it was tempting to make the assistant "smart" by default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detect natural language intent&lt;/li&gt;
&lt;li&gt;intercept normal chat&lt;/li&gt;
&lt;li&gt;decide whether this looks like a question, a task, or an operation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That approach demos well. It does not age well.&lt;/p&gt;

&lt;p&gt;In real usage, developers care less about magic and more about whether the product preserves session continuity. If they are already inside a working runtime, surprise orchestration feels like the system stole the steering wheel.&lt;/p&gt;

&lt;p&gt;So we changed the philosophy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;normal messages should stay low-interruption&lt;/li&gt;
&lt;li&gt;assistant takeover should be explicit&lt;/li&gt;
&lt;li&gt;the assistant should feel collaborative, not invasive&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The implementation detail that mattered most
&lt;/h2&gt;

&lt;p&gt;The mode switch is intentionally small.&lt;/p&gt;

&lt;p&gt;Inside &lt;code&gt;assistant-core/mode-service.js&lt;/code&gt;, we only enter the assistant flow when the conversation is already in assistant mode or the user explicitly triggers it with &lt;code&gt;/cligate&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;assistantModeActive&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;return null&lt;/code&gt; is doing a lot of work.&lt;/p&gt;

&lt;p&gt;It means the assistant does not get a chance to reinterpret every ordinary message. It only runs when the user has actually asked for it.&lt;/p&gt;

&lt;p&gt;There is also a matching escape hatch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sends the conversation back to direct runtime mode.&lt;/p&gt;

&lt;p&gt;This ended up feeling much more respectful than trying to infer intent from every sentence.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the assistant is actually responsible for
&lt;/h2&gt;

&lt;p&gt;We also had to get stricter about role boundaries in the codebase.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;CliGate Assistant&lt;/code&gt; is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orchestration&lt;/li&gt;
&lt;li&gt;observation&lt;/li&gt;
&lt;li&gt;approvals and blockers&lt;/li&gt;
&lt;li&gt;task tracking&lt;/li&gt;
&lt;li&gt;result composition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Codex and Claude Code are still responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;editing files&lt;/li&gt;
&lt;li&gt;running commands&lt;/li&gt;
&lt;li&gt;browser work&lt;/li&gt;
&lt;li&gt;concrete task execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds obvious, but systems get messy when the assistant starts pretending it is also the executor.&lt;/p&gt;

&lt;p&gt;Once we treated the assistant as a supervisor instead of a universal chat brain, the architecture became easier to reason about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;assistant-core&lt;/code&gt; owns assistant semantics and state&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;assistant-agent&lt;/code&gt; owns the LLM supervisor loop&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;agent-*&lt;/code&gt; modules remain the execution and runtime substrate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The user-facing result
&lt;/h2&gt;

&lt;p&gt;The product now behaves more like a real teammate and less like a clever router.&lt;/p&gt;

&lt;p&gt;If you want to continue the active runtime session, you just continue it.&lt;/p&gt;

&lt;p&gt;If you want the system to step back, look at the broader situation, and coordinate work across sessions, you invoke the assistant deliberately.&lt;/p&gt;

&lt;p&gt;That separation improved three things immediately:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;session continuity became easier to trust&lt;/li&gt;
&lt;li&gt;task delegation became easier to explain&lt;/li&gt;
&lt;li&gt;mobile and channel workflows made more sense because the assistant could supervise without hijacking every turn&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  I think more AI tools need this split
&lt;/h2&gt;

&lt;p&gt;A lot of AI products blur "assistant" and "executor" into one conversation because it feels simpler.&lt;/p&gt;

&lt;p&gt;I think that simplicity is fake.&lt;/p&gt;

&lt;p&gt;As soon as the product has long-running sessions, approvals, retries, resumable work, or multiple executors, you need two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one for staying inside the current runtime&lt;/li&gt;
&lt;li&gt;one for asking a supervisor to coordinate work around that runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that split, the system keeps guessing when it should just listen.&lt;/p&gt;

&lt;p&gt;How are you handling this in your own tools?&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>"My README Kept Trying to Be the Whole Product Manual. So I Split It Into 3 Layers"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Wed, 13 May 2026 09:14:53 +0000</pubDate>
      <link>https://dev.to/codekingai/my-readme-kept-trying-to-be-the-whole-product-manual-so-i-split-it-into-3-layers-383h</link>
      <guid>https://dev.to/codekingai/my-readme-kept-trying-to-be-the-whole-product-manual-so-i-split-it-into-3-layers-383h</guid>
      <description>&lt;p&gt;I kept fixing the same problem in three different places.&lt;/p&gt;

&lt;p&gt;Someone would land on the GitHub repo for my local AI gateway and need a fast answer: what is this thing, what does it support, and how do I start it?&lt;/p&gt;

&lt;p&gt;Instead, they got the same thing a lot of open-source projects accidentally grow into: a README that wanted to be a landing page, onboarding guide, operator manual, architecture index, and release checklist at the same time.&lt;/p&gt;

&lt;p&gt;That works for a while. Then every edit makes it worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode was boring but expensive
&lt;/h2&gt;

&lt;p&gt;The project is &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;, a local AI gateway that sits between tools like Claude Code, Codex CLI, Gemini CLI, OpenClaw, dashboard chat, channel workflows, and upstream model providers.&lt;/p&gt;

&lt;p&gt;As the product surface expanded, the docs expanded with it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;protocol translation details&lt;/li&gt;
&lt;li&gt;account pools and API keys&lt;/li&gt;
&lt;li&gt;app routing and model mapping&lt;/li&gt;
&lt;li&gt;dashboard pages&lt;/li&gt;
&lt;li&gt;runtime sessions&lt;/li&gt;
&lt;li&gt;Telegram and Feishu channels&lt;/li&gt;
&lt;li&gt;local manuals inside the product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result was predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the GitHub README kept getting longer&lt;/li&gt;
&lt;li&gt;first-time users still needed a cleaner path&lt;/li&gt;
&lt;li&gt;the in-product assistant needed stable source material&lt;/li&gt;
&lt;li&gt;maintainers needed room for operational docs that should never be the first thing a new user reads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the real problem was not "write more docs."&lt;/p&gt;

&lt;p&gt;It was "stop making one document do five jobs."&lt;/p&gt;

&lt;h2&gt;
  
  
  I ended up with a three-layer docs model
&lt;/h2&gt;

&lt;p&gt;The fix in this repo was to split the content by reader intent instead of by file history.&lt;/p&gt;

&lt;p&gt;Now the project has three distinct layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;a repo-facing &lt;code&gt;README.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a docs hub under &lt;code&gt;docs/README.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a lightweight in-product manual served from &lt;code&gt;/manual/&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each one answers a different question.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: README is for orientation, not full ownership of every detail
&lt;/h2&gt;

&lt;p&gt;The current README now does the things a repo landing page is actually good at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;explain what CliGate is&lt;/li&gt;
&lt;li&gt;show the supported surfaces&lt;/li&gt;
&lt;li&gt;give the shortest quick start&lt;/li&gt;
&lt;li&gt;point to the right deeper documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That keeps the first screen useful instead of turning it into a scroll tax.&lt;/p&gt;

&lt;p&gt;The structure is intentionally compact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Quick Start&lt;/span&gt;

&lt;span class="gu"&gt;### 1. Start CliGate&lt;/span&gt;
&lt;span class="gu"&gt;### 2. Add at least one working credential&lt;/span&gt;
&lt;span class="gu"&gt;### 3. Point your tool to CliGate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it still gives concrete configuration examples, like Codex pointing at localhost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;chatgpt_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8081/backend-api/"&lt;/span&gt;
&lt;span class="py"&gt;openai_base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost:8081"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough for a reader who wants to know whether the project is relevant before they commit to the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: the docs hub is the router for human readers
&lt;/h2&gt;

&lt;p&gt;Once the README stops pretending to be everything, you need a clean next step.&lt;/p&gt;

&lt;p&gt;That is what &lt;code&gt;docs/README.md&lt;/code&gt; became.&lt;/p&gt;

&lt;p&gt;Instead of a random directory listing, it routes by audience:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## By Audience&lt;/span&gt;

&lt;span class="gu"&gt;### New users&lt;/span&gt;
&lt;span class="gu"&gt;### Integrators and operators&lt;/span&gt;
&lt;span class="gu"&gt;### Contributors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This seems obvious, but it fixed a real repo problem for me.&lt;/p&gt;

&lt;p&gt;When documentation grows organically, file names make sense to maintainers and almost nobody else. A docs hub changes the question from:&lt;/p&gt;

&lt;p&gt;"Which markdown file sounds closest to my problem?"&lt;/p&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;p&gt;"What kind of reader am I, and where should I start?"&lt;/p&gt;

&lt;p&gt;That is a much better first branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: the product needed its own short manual
&lt;/h2&gt;

&lt;p&gt;The part I did not want to keep faking was the in-product help path.&lt;/p&gt;

&lt;p&gt;When users are already inside the dashboard, they usually do not want the full repository story. They want a short operational guide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what does this product do&lt;/li&gt;
&lt;li&gt;what is the default address&lt;/li&gt;
&lt;li&gt;what do I configure first&lt;/li&gt;
&lt;li&gt;where in the dashboard should I go next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So CliGate now serves a lightweight manual at &lt;code&gt;/manual/&lt;/code&gt;, separate from the repo README and separate from the longer markdown manuals.&lt;/p&gt;

&lt;p&gt;The HTML is deliberately focused on quick orientation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h2&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"page-title"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Understand, configure, and verify CliGate quickly&lt;span class="nt"&gt;&amp;lt;/h2&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"page-subtle"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;This is the in-product quick manual. For full reference, use the complete product manuals.&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the route layer exposes the source documents explicitly instead of scraping whatever happens to be on disk:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DOC_FILE_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;freeze&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;README.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;README.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;API.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;API.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ARCHITECTURE.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ARCHITECTURE.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;product-manual.en.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;product-manual.en.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;product-manual.zh-CN.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;product-manual.zh-CN.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That mattered for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the UI got a stable set of documents&lt;/li&gt;
&lt;li&gt;the product assistant got a cleaner source of truth&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The manual files are now doing real product work
&lt;/h2&gt;

&lt;p&gt;This was the architectural shift that made the cleanup worth it.&lt;/p&gt;

&lt;p&gt;The docs are not only for GitHub readers anymore. They are also part of the product behavior.&lt;/p&gt;

&lt;p&gt;The product manual now carries the user-facing explanation of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dashboard navigation&lt;/li&gt;
&lt;li&gt;routing concepts&lt;/li&gt;
&lt;li&gt;CLI configuration&lt;/li&gt;
&lt;li&gt;channel workflows&lt;/li&gt;
&lt;li&gt;troubleshooting paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the manual has to be shaped for actual usage, not just for repository completeness.&lt;/p&gt;

&lt;p&gt;One note in the docs hub says it pretty plainly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`product-manual.en.md`&lt;/span&gt; and &lt;span class="sb"&gt;`product-manual.zh-CN.md`&lt;/span&gt; are the primary user-facing manuals.
&lt;span class="p"&gt;-&lt;/span&gt; The product assistant reads from those manual files directly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once that became true, letting the README keep absorbing everything stopped making sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  I also had to accept that maintainers and users need different entry points
&lt;/h2&gt;

&lt;p&gt;This is the trap I keep seeing in open-source docs.&lt;/p&gt;

&lt;p&gt;Maintainers are comfortable with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roadmap files&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;release checklists&lt;/li&gt;
&lt;li&gt;migration plans&lt;/li&gt;
&lt;li&gt;incident writeups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New users are not.&lt;/p&gt;

&lt;p&gt;If those documents sit beside the real onboarding path without any structure, the repo starts feeling harder than the product.&lt;/p&gt;

&lt;p&gt;So the current split lets the project keep maintainers' documents in &lt;code&gt;docs/&lt;/code&gt; without making them the front door. The docs hub explicitly calls that out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Planning, incident, and roadmap documents remain in this directory for maintainers, but they are not the best entry point for first-time users.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one sentence removed a lot of ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed for the actual product
&lt;/h2&gt;

&lt;p&gt;The cleanup was not cosmetic. It changed how the project presents itself in three different contexts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub readers now get a faster landing path&lt;/li&gt;
&lt;li&gt;dashboard users now get a short manual without leaving the product&lt;/li&gt;
&lt;li&gt;the product assistant now has clearer manual context to answer from&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last one is easy to underestimate.&lt;/p&gt;

&lt;p&gt;If you build an assistant into the product, your documentation stops being passive. It becomes runtime input. Structure starts to matter much more than volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern I would reuse
&lt;/h2&gt;

&lt;p&gt;If your open-source project is growing past a single README, I think this split is a better default than endlessly reorganizing one giant file:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; for orientation and quick start&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;docs/README.md&lt;/code&gt; as a docs router by audience&lt;/li&gt;
&lt;li&gt;an in-product quick manual for operational tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Not every project needs all three.&lt;/p&gt;

&lt;p&gt;But the moment your docs are serving both repository readers and product users, pretending those are the same audience usually creates a worse experience for both.&lt;/p&gt;

&lt;p&gt;If you want to inspect the implementation, the project is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm curious how other people are handling this boundary. When did your README stop being a README and start trying to become the whole product manual?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>"You Don't Need Matching Model Names to Run AI Coding Tools"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 11 May 2026 09:53:48 +0000</pubDate>
      <link>https://dev.to/codekingai/you-dont-need-matching-model-names-to-run-ai-coding-tools-5cfa</link>
      <guid>https://dev.to/codekingai/you-dont-need-matching-model-names-to-run-ai-coding-tools-5cfa</guid>
      <description>&lt;p&gt;I ran into a boring problem that kept wasting real time:&lt;/p&gt;

&lt;p&gt;my coding tool said &lt;code&gt;gpt-5.5&lt;/code&gt;, my provider said the deployment was called something else, and suddenly I was debugging configuration instead of code.&lt;/p&gt;

&lt;p&gt;Not model quality. Not prompts. Not token limits.&lt;/p&gt;

&lt;p&gt;Just names.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mismatch that keeps showing up
&lt;/h2&gt;

&lt;p&gt;A lot of AI tooling quietly assumes this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool model name == provider model name == provider deployment name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a nice fantasy.&lt;/p&gt;

&lt;p&gt;It falls apart the moment you use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure OpenAI deployments&lt;/li&gt;
&lt;li&gt;provider-specific aliases&lt;/li&gt;
&lt;li&gt;internal model mapping&lt;/li&gt;
&lt;li&gt;multiple CLI tools that all expect different protocol surfaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One tool wants &lt;code&gt;gpt-5.5&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Your provider may expose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model: gpt-5.5
deployment: team-codex-prod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model: claude-sonnet-4-6
upstream target: vertex publisher endpoint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;requested model: claude-sonnet-4-6
actual target: gpt-5.4-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The names are not the same, and they do not need to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part that annoyed me most
&lt;/h2&gt;

&lt;p&gt;The worst failures were the confusing ones.&lt;/p&gt;

&lt;p&gt;The request did not always hard-fail.&lt;/p&gt;

&lt;p&gt;Sometimes the tool UI said one thing, the proxy logs said another thing, and the actual upstream target depended on one more layer hidden in provider config.&lt;/p&gt;

&lt;p&gt;So you end up asking questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is this model being mapped by tier?&lt;/li&gt;
&lt;li&gt;Is it passed through because it already looks native?&lt;/li&gt;
&lt;li&gt;Is the provider overriding it with a deployment name anyway?&lt;/li&gt;
&lt;li&gt;Is the UI showing a discovered model that the mapping page cannot even configure yet?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is too much ceremony for "send this prompt to the model I meant."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I changed
&lt;/h2&gt;

&lt;p&gt;I stopped letting the tools own the final name resolution.&lt;/p&gt;

&lt;p&gt;I put the decision inside a local gateway.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;, the flow looks more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code / Codex CLI / Gemini CLI
        |
        v
     localhost
        |
        +-&amp;gt; routing
        +-&amp;gt; model mapping
        +-&amp;gt; provider translation
        +-&amp;gt; deployment override if needed
        |
        v
   actual upstream target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means the tool can keep asking for the model name it understands, while the gateway decides what the provider should really receive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more on Azure
&lt;/h2&gt;

&lt;p&gt;Azure OpenAI is where this gets obvious fast.&lt;/p&gt;

&lt;p&gt;With Azure, there are usually at least two identities in play:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the model name the client thinks it wants&lt;/li&gt;
&lt;li&gt;the deployment name Azure actually expects&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your bridge code forwards the requested model directly, but the provider later replaces it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deploymentName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then the real runtime behavior depends on the deployment config, not the string the CLI showed you.&lt;/p&gt;

&lt;p&gt;That is not wrong.&lt;/p&gt;

&lt;p&gt;It just means you need better visibility and a better configuration surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix is not "make every name identical"
&lt;/h2&gt;

&lt;p&gt;I do not think the right answer is forcing every tool, provider, and deployment to share the same label.&lt;/p&gt;

&lt;p&gt;That breaks down as soon as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one provider requires deployment indirection&lt;/li&gt;
&lt;li&gt;another provider wants a publisher-specific route&lt;/li&gt;
&lt;li&gt;a third provider can only support a tier-mapped equivalent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The better rule is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;let the tool ask for a logical model, and let the gateway resolve the physical target&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is what model mapping should be doing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two things I actually needed
&lt;/h2&gt;

&lt;p&gt;After working through this, I realized the UI had to support &lt;strong&gt;both&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. discovered models
&lt;/h3&gt;

&lt;p&gt;If the provider can list available models, show them.&lt;/p&gt;

&lt;p&gt;That makes it easy to pick things like &lt;code&gt;gpt-5.5&lt;/code&gt; when the upstream already advertises them.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. manual model or deployment entry
&lt;/h3&gt;

&lt;p&gt;If the provider uses a deployment name that is not auto-discovered the way the UI expects, I still need to type it manually.&lt;/p&gt;

&lt;p&gt;This matters a lot for provider bridges where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the useful identifier is a deployment, not a catalog model ID&lt;/li&gt;
&lt;li&gt;discovery can lag behind reality&lt;/li&gt;
&lt;li&gt;the operational name is local to one account or tenant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the UI only gives me a fixed dropdown, it is pretending the ecosystem is cleaner than it is.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I changed in my own setup
&lt;/h2&gt;

&lt;p&gt;I updated the model mapping flow so it now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;returns discovered provider models through the model-mapping API&lt;/li&gt;
&lt;li&gt;merges discovered models with static mapping candidates&lt;/li&gt;
&lt;li&gt;allows manual entry instead of forcing a dropdown-only selection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the configuration step is no longer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"pick from whatever the UI happened to preload"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"pick a discovered model or type the real deployment/model name you actually use"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is a much more honest interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  The boring but important engineering lesson
&lt;/h2&gt;

&lt;p&gt;I think a lot of AI infra bugs come from mixing up three different concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;display name&lt;/strong&gt; in the tool&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;logical model ID&lt;/strong&gt; used for routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;physical upstream target&lt;/strong&gt; used at the provider edge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When those three collapse into one string, everything feels simple.&lt;/p&gt;

&lt;p&gt;When they do not, you need explicit mapping and explicit logs.&lt;/p&gt;

&lt;p&gt;Otherwise you get the classic failure mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;UI says one thing
logs say one thing
provider actually runs something else
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not really a model problem.&lt;/p&gt;

&lt;p&gt;It is an observability problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you're building local AI tooling, I would strongly recommend this split
&lt;/h2&gt;

&lt;p&gt;Do not let every CLI client carry your provider-specific naming quirks.&lt;/p&gt;

&lt;p&gt;Let them speak in the model vocabulary they already know.&lt;/p&gt;

&lt;p&gt;Then keep the messy parts in one place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;provider adaptation&lt;/li&gt;
&lt;li&gt;deployment overrides&lt;/li&gt;
&lt;li&gt;request logs&lt;/li&gt;
&lt;li&gt;model mapping UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That design has held up much better for me than trying to force the whole stack to agree on one shared name.&lt;/p&gt;

&lt;p&gt;Repo:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you handling this right now?&lt;/p&gt;

&lt;p&gt;Are you keeping model names and deployment names identical on purpose, or are you hiding the mismatch behind a gateway too?&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
