<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: CodeKing</title>
    <description>The latest articles on DEV Community by CodeKing (@codekingai).</description>
    <link>https://dev.to/codekingai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843914%2Fedc4fbb1-edd3-4c7d-9c94-e2b13dbc1af0.jpg</url>
      <title>DEV Community: CodeKing</title>
      <link>https://dev.to/codekingai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codekingai"/>
    <language>en</language>
    <item>
      <title>"I Pointed Claude Code at Google's Antigravity — Here's the 5-Minute OAuth Setup"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 30 Apr 2026 07:14:07 +0000</pubDate>
      <link>https://dev.to/codekingai/i-pointed-claude-code-at-googles-antigravity-heres-the-5-minute-oauth-setup-45bp</link>
      <guid>https://dev.to/codekingai/i-pointed-claude-code-at-googles-antigravity-heres-the-5-minute-oauth-setup-45bp</guid>
      <description>&lt;p&gt;The thing that finally pushed me over the edge was a 401 at 1am.&lt;/p&gt;

&lt;p&gt;Anthropic key rotated. Claude Code dead. Three terminals open, a half-finished refactor, and now I'm digging through a password manager looking for the right &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; to paste into a shell config so my CLI tool can keep talking to a model.&lt;/p&gt;

&lt;p&gt;Then it hit me: I was already logged into Google's &lt;strong&gt;Antigravity&lt;/strong&gt; in another tab. And Antigravity hands out &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; and &lt;code&gt;claude-opus-4-6&lt;/code&gt; like it's nothing.&lt;/p&gt;

&lt;p&gt;So why was I paying Anthropic again?&lt;/p&gt;

&lt;h2&gt;
  
  
  What Antigravity Actually Is
&lt;/h2&gt;

&lt;p&gt;If you haven't bumped into it yet — Antigravity is Google's enterprise developer platform, built on top of Google Cloud Code Assist. The interesting part isn't that it gives you Gemini. The interesting part is that it gives you &lt;strong&gt;Claude&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Sign in with a Google account, accept the Code Assist terms, and the model list that comes back includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-sonnet-4-6
claude-sonnet-4-6-thinking
claude-opus-4-6
claude-opus-4-6-thinking
gemini-2.5-pro
gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's Anthropic's flagship models, served through Google's infrastructure, authorized by a Google OAuth token. No Anthropic key, no Anthropic billing surface, no key rotation.&lt;/p&gt;

&lt;p&gt;The catch — and this is the whole reason I'm writing this — is that the protocol Antigravity speaks is not the protocol Claude Code speaks. Antigravity expects requests at &lt;code&gt;cloudcode-pa.googleapis.com/v1internal:generateContent&lt;/code&gt; with a Google bearer token. Claude Code wants to talk to &lt;code&gt;api.anthropic.com/v1/messages&lt;/code&gt; with an Anthropic key.&lt;/p&gt;

&lt;p&gt;You can't just point Claude Code at Antigravity. The shapes don't match.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bridge
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; is a local proxy I've been working on that already handles routing Claude Code, Codex CLI, and Gemini CLI to multiple backends. The Antigravity integration is the newest piece: it adds a Google-OAuth-backed account pool that exposes Antigravity's Claude and Gemini models as a routing target.&lt;/p&gt;

&lt;p&gt;When Claude Code sends an Anthropic-formatted streaming request to the proxy, CliGate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Picks an Antigravity account from the pool&lt;/li&gt;
&lt;li&gt;Refreshes the Google access token if it's near expiry&lt;/li&gt;
&lt;li&gt;Translates the Anthropic Messages payload into Antigravity's &lt;code&gt;generateContent&lt;/code&gt; shape&lt;/li&gt;
&lt;li&gt;Streams the response back as Anthropic SSE&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Claude Code never knows it's talking to Google. It thinks it just got a normal response from Anthropic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-Minute Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Start CliGate&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dashboard opens at &lt;code&gt;http://localhost:8081&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Add your Antigravity account&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Accounts → Antigravity&lt;/strong&gt;. Click &lt;strong&gt;Sign in with Google&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✓ Browser opens to accounts.google.com
✓ You authorize Cloud Platform + Code Assist scopes
✓ Callback hits localhost:36545
✓ CliGate stores the refresh token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OAuth flow hands back a refresh token that CliGate keeps in encrypted local storage. Tokens get auto-refreshed when they're under 5 minutes from expiry — you log in once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Let CliGate discover your models&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After the account is added, CliGate calls &lt;code&gt;loadCodeAssist&lt;/code&gt; to fetch your project ID, then &lt;code&gt;fetchAvailableModels&lt;/code&gt; to discover what your account can use. The model list populates automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-sonnet-4-6           ✓ recommended
claude-sonnet-4-6-thinking  ✓ thinking variant
claude-opus-4-6             ✓ recommended
claude-opus-4-6-thinking    ✓ thinking variant
gemini-2.5-pro
gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No manual entry. The list updates if your account's quotas change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — Route Claude Code to Antigravity&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open &lt;strong&gt;App Routing&lt;/strong&gt;. For Claude Code, set the credential to your Antigravity account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code → Antigravity (your-google-email)
Codex CLI   → ChatGPT Plus account
Gemini CLI  → cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Claude Code is already pointed at CliGate from the one-click setup, so this routing change takes effect on the next request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Test it&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="s2"&gt;"explain what this regex matches: ^(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s2"&gt;{4})-(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s2"&gt;{2})-(&lt;/span&gt;&lt;span class="se"&gt;\d&lt;/span&gt;&lt;span class="s2"&gt;{2})$"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The request goes to CliGate. CliGate sees the routing rule, picks your Antigravity account, talks to &lt;code&gt;cloudcode-pa.googleapis.com&lt;/code&gt;, streams the response back as Anthropic SSE. You see normal Claude Code output.&lt;/p&gt;

&lt;p&gt;If you want to confirm it's actually hitting Antigravity, the &lt;strong&gt;Request Logs&lt;/strong&gt; tab shows the upstream provider for every call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;14:23:01  POST /v1/messages   →  antigravity / claude-sonnet-4-6   ✓ 200
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;The model name juggling.&lt;/p&gt;

&lt;p&gt;Anthropic's public model IDs and Antigravity's internal model IDs don't quite line up. Claude Code might send &lt;code&gt;claude-sonnet-4-5-20250929&lt;/code&gt; (an old alias). Antigravity expects &lt;code&gt;claude-sonnet-4-6&lt;/code&gt;. The proxy normalizes between them on the way out and on the way back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Sketch of the normalization&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;normalizeForAntigravity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-opus-4-6-thinking&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're routing your own tools through Antigravity, this is the gotcha you'll hit first. The actual Antigravity API doesn't return a friendly error for an unknown model — it 404s with no JSON body. Spent an hour on that one.&lt;/p&gt;

&lt;h2&gt;
  
  
  When This Is Actually Useful
&lt;/h2&gt;

&lt;p&gt;I'm not pretending this is a trick that works for everyone. But there are three real cases where it matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your company already has Google Workspace.&lt;/strong&gt; Adding an Antigravity account is zero new billing. Anthropic keys are a separate procurement conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You don't want Anthropic keys floating around in dev shells.&lt;/strong&gt; OAuth tokens with refresh rotation are easier to revoke than keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want one tool (Claude Code) but two backends.&lt;/strong&gt; Toggle between Antigravity and a direct Anthropic key from the dashboard, depending on quota or model availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same proxy handles ChatGPT account pools, Anthropic OAuth, raw API keys, Azure OpenAI, and local Ollama. Antigravity is just one more lane.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Your Setup?
&lt;/h2&gt;

&lt;p&gt;Are you on Antigravity yet? Or still on a direct Anthropic key? I'm curious whether the enterprise OAuth path is something people are actually using day-to-day, or whether the API-key-in-env-var workflow is too entrenched to displace.&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>tutorial</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>"I Wired DeepSeek V4 Into Claude Code and Codex CLI Without Touching the Tools"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 27 Apr 2026 08:09:54 +0000</pubDate>
      <link>https://dev.to/codekingai/i-wired-deepseek-v4-into-claude-code-and-codex-cli-without-touching-the-tools-64d</link>
      <guid>https://dev.to/codekingai/i-wired-deepseek-v4-into-claude-code-and-codex-cli-without-touching-the-tools-64d</guid>
      <description>&lt;p&gt;DeepSeek V4 dropped, the benchmarks looked aggressive, and the price-per-million-tokens looked even more aggressive. The first thing I wanted to know wasn't "is it as good as Claude Opus 4.6 or GPT-5.4?" — it was "can my actual coding agents use it without me rewriting half my workflow?"&lt;/p&gt;

&lt;p&gt;Because that's the part nobody benchmarks. A model can be the cheapest reasoner on the leaderboard and still be useless to me if Claude Code, Codex CLI, and Gemini CLI can't talk to it the way they expect to.&lt;/p&gt;

&lt;p&gt;Here's what I learned getting DeepSeek V4 into a real AI coding workflow without forking any of the CLIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The protocol problem nobody warns you about
&lt;/h2&gt;

&lt;p&gt;Every AI coding tool has hard-coded assumptions about which API protocol it speaks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; speaks Anthropic's Messages API. It expects &lt;code&gt;x-api-key&lt;/code&gt;, &lt;code&gt;anthropic-version&lt;/code&gt;, content blocks, &lt;code&gt;cache_control&lt;/code&gt;, the whole shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; speaks OpenAI's Responses API. It expects &lt;code&gt;Authorization: Bearer&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;tool_choice&lt;/code&gt;, streaming SSE in OpenAI's specific format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; speaks Google's GenerativeLanguage API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DeepSeek's solution is genuinely thoughtful — they expose &lt;strong&gt;both&lt;/strong&gt; an OpenAI-compatible endpoint at &lt;code&gt;https://api.deepseek.com&lt;/code&gt; and an Anthropic-compatible endpoint at &lt;code&gt;https://api.deepseek.com/anthropic&lt;/code&gt;. Same model, two protocols.&lt;/p&gt;

&lt;p&gt;Great. So in theory you can point Claude Code at the Anthropic endpoint and Codex at the OpenAI endpoint and you're done.&lt;/p&gt;

&lt;p&gt;In practice you can't, because Claude Code reads exactly one &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; and Codex reads exactly one &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt;. The moment you want to use &lt;strong&gt;both&lt;/strong&gt; Claude (via Anthropic-direct) &lt;strong&gt;and&lt;/strong&gt; DeepSeek (via DeepSeek-Anthropic-compat) for the same agent, you have to pick. Switching means restarting the tool with new env vars. Every. Single. Time.&lt;/p&gt;

&lt;p&gt;That's the moment a local gateway stops being optional.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup: one localhost, three tools, four providers
&lt;/h2&gt;

&lt;p&gt;The thing I built — &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; — runs as a local proxy on &lt;code&gt;localhost:7860&lt;/code&gt;. Each AI coding tool points at it once and never thinks about provider URLs again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:7860
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;any-string

&lt;span class="c"&gt;# Codex CLI&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:7860/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;any-string

&lt;span class="c"&gt;# Gemini CLI&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_GEMINI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:7860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway then routes each request to whichever provider you configured for that tier — Anthropic, OpenAI, Gemini, Azure OpenAI, GLM, &lt;strong&gt;or DeepSeek&lt;/strong&gt; — and translates the protocol on the way out and back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding DeepSeek as a first-class provider
&lt;/h2&gt;

&lt;p&gt;Here's the actual provider implementation that landed last week. The interesting part is what I &lt;em&gt;didn't&lt;/em&gt; have to write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OpenAIProvider&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./openai.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_BASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.deepseek.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ANTHROPIC_API_VERSION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2023-06-01&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DeepSeekProvider&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;OpenAIProvider&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_BASE_URL&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="c1"&gt;// Ride chat-completions for Codex/Responses; not a native Responses provider.&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sendResponsesRequest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;_buildAnthropicBaseUrl&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/anthropic`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;sendAnthropicRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_buildAnthropicBaseUrl&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;/v1/messages`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic-version&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ANTHROPIC_API_VERSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's basically it. Forty lines.&lt;/p&gt;

&lt;p&gt;The reason it's forty lines and not four hundred: DeepSeek's Anthropic-compatible endpoint is genuinely Anthropic-compatible. I don't have to translate Claude Code's messages into something else and back — I can just forward them. For OpenAI-compatible traffic from Codex, the existing &lt;code&gt;OpenAIProvider&lt;/code&gt; already handles chat-completions; I extend it and override the base URL.&lt;/p&gt;

&lt;p&gt;The one subtle thing — &lt;code&gt;this.sendResponsesRequest = undefined&lt;/code&gt; — matters because DeepSeek does &lt;strong&gt;not&lt;/strong&gt; implement OpenAI's newer Responses API. If I left that inherited, the gateway would try to call &lt;code&gt;/v1/responses&lt;/code&gt; and get 404s. By unsetting it, the gateway falls back to chat-completions, which DeepSeek does support. That single line is the kind of detail that separates "it works in my demo" from "it works for a coding agent that does 50 tool calls in a session."&lt;/p&gt;

&lt;h2&gt;
  
  
  The model tier mapping
&lt;/h2&gt;

&lt;p&gt;CliGate maps incoming model names to &lt;strong&gt;tiers&lt;/strong&gt; — &lt;code&gt;flagship&lt;/code&gt;, &lt;code&gt;standard&lt;/code&gt;, &lt;code&gt;fast&lt;/code&gt;, &lt;code&gt;reasoning&lt;/code&gt; — and each provider declares which of its models fills each tier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;deepseek&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;flagship&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek-v4-pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;standard&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek-v4-flash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek-v4-flash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek-v4-pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So when Claude Code asks for &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; and you've routed &lt;code&gt;standard&lt;/code&gt; traffic to DeepSeek, the gateway translates that into a &lt;code&gt;deepseek-v4-flash&lt;/code&gt; call — without Claude Code knowing anything changed. The agent thinks it's still talking to Anthropic.&lt;/p&gt;

&lt;p&gt;This is the part that actually matters for cost. DeepSeek V4 Flash is priced at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input: &lt;strong&gt;$0.27 per million tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Cache hit input: &lt;strong&gt;$0.07 per million tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output: &lt;strong&gt;$1.10 per million tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to a flagship model in the $3–$15 range, you can move the bulk of your boilerplate-tier coding traffic — file edits, simple refactors, test scaffolding — to DeepSeek Flash and only escalate to Claude or GPT-5 for the gnarly reasoning tasks. Same agent, three different brains, picked per request.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually surprised me
&lt;/h2&gt;

&lt;p&gt;Two things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, tool calling worked on the Anthropic-compat endpoint without modification.&lt;/strong&gt; I expected to have to sanitize tool schemas the way I do for Azure OpenAI (which rejects &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;const&lt;/code&gt;, etc.). DeepSeek's Anthropic-compat layer accepts what Claude Code emits. That's a meaningful effort on their end.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, the cache pricing changes my mental model.&lt;/strong&gt; DeepSeek bills cache reads at roughly a quarter of the normal input rate. For a coding agent doing repeated tool calls within the same session — where most of the prompt is the same system + history + repo context — caching turns into the dominant economic factor. It's no longer "which model is cheapest per token", it's "which model's cache hit rate × cache price is lowest."&lt;/p&gt;

&lt;p&gt;That's a different optimization problem than the one the leaderboards are solving.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I'm not abandoning Claude or GPT-5
&lt;/h2&gt;

&lt;p&gt;To be clear: this isn't a "DeepSeek replaces everything" post. After a week of routing traffic, my mental split is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hard reasoning, ambiguous specs, large refactors&lt;/strong&gt; → Claude Opus 4.6 or GPT-5.4. They still win when the problem isn't well-formed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boilerplate code generation, formatting, test scaffolding, doc writing&lt;/strong&gt; → DeepSeek V4 Flash. The quality is fine and the cost is roughly an order of magnitude lower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dirt-cheap classification, intent routing, log triage&lt;/strong&gt; → DeepSeek Flash with aggressive caching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gateway is what makes this split &lt;em&gt;practical&lt;/em&gt; instead of theoretical. Without it, every "use DeepSeek for this task" decision means restarting Claude Code with new env vars. With it, the routing happens server-side and the tools never know.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add a DeepSeek API key in the dashboard, route the &lt;code&gt;standard&lt;/code&gt; and &lt;code&gt;fast&lt;/code&gt; tiers to DeepSeek, leave &lt;code&gt;flagship&lt;/code&gt; and &lt;code&gt;reasoning&lt;/code&gt; on Claude or GPT-5. Run Claude Code or Codex CLI as normal.&lt;/p&gt;

&lt;p&gt;I'd genuinely like to hear how others are splitting workloads across model providers right now. Are you doing it at the agent level, the request level, or just paying for one flagship and calling it done? The answer changed for me twice this quarter and I don't think I've found the final shape yet.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"I Stopped Building a Coding Agent and Built a Supervisor for Codex and Claude Code Instead"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 23 Apr 2026 07:14:00 +0000</pubDate>
      <link>https://dev.to/codekingai/i-stopped-building-a-coding-agent-and-built-a-supervisor-for-codex-and-claude-code-instead-2d06</link>
      <guid>https://dev.to/codekingai/i-stopped-building-a-coding-agent-and-built-a-supervisor-for-codex-and-claude-code-instead-2d06</guid>
      <description>&lt;p&gt;A couple of weeks ago I was about to do what everyone on my timeline was doing: build another coding agent. Read files, run commands, plan steps, loop until done.&lt;/p&gt;

&lt;p&gt;Then I asked myself the uncomfortable question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why am I building a worse version of Claude Code and Codex, when both of them are already installed on my machine and work better than anything I can ship this month?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I stopped. And I built the opposite of a coding agent instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I was getting wrong
&lt;/h2&gt;

&lt;p&gt;I kept describing the problem as "I want an agent." But when I wrote down what I actually needed it to do, almost none of it was coding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pick whether this request should go to Codex or Claude Code&lt;/li&gt;
&lt;li&gt;decide whether it belongs in the current runtime session or a new one&lt;/li&gt;
&lt;li&gt;remember what task the user was iterating on&lt;/li&gt;
&lt;li&gt;surface approval prompts that are hiding in logs&lt;/li&gt;
&lt;li&gt;summarize when a run finishes&lt;/li&gt;
&lt;li&gt;handle "retry that last one" without a human translating&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those are coding tasks. They are &lt;strong&gt;dispatch, supervision, and memory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The executors (Codex, Claude Code) are the muscle. What I was missing wasn't more muscle. It was a nervous system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Control plane vs execution plane
&lt;/h2&gt;

&lt;p&gt;Once I framed it that way, the architecture fell out naturally. I now split the system into two planes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Execution plane&lt;/strong&gt; — Codex, Claude Code, and any future runtime that can actually write files and run commands. These are providers. They are &lt;em&gt;not&lt;/em&gt; the agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Control plane&lt;/strong&gt; — the supervisor agent. It reasons about what to do, chooses an executor, dispatches, observes, and reports back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rule I gave myself: &lt;strong&gt;the control plane never writes code.&lt;/strong&gt; If it ever finds itself wanting to, that's a signal that I'm collapsing the two planes and I need to stop and route the work to an executor instead.&lt;/p&gt;

&lt;p&gt;This is the opposite of the current trend, where everyone is trying to pack more executor capability into a single agent loop. I went the other way on purpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the supervisor actually does
&lt;/h2&gt;

&lt;p&gt;The supervisor runs its own ReAct loop — but the tools aren't &lt;code&gt;read_file&lt;/code&gt; and &lt;code&gt;run_command&lt;/code&gt;. They're dispatch and observation tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;start_runtime_task(provider, prompt, working_dir)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;continue_runtime_task(session_id, message)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;get_runtime_status(session_id)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;list_active_sessions(conversation_id)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;approve_pending_question(session_id, answer)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;recall_memory(scope, key)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;write_memory(scope, key, value)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;summarize_task(session_id)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. That's the tool catalog for the agent itself. The coding tools live inside Codex and Claude Code, where they already work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observation First — the rule that saved me
&lt;/h2&gt;

&lt;p&gt;The biggest failure mode I expected was the supervisor getting poisoned by the raw text streams from the executors. Dozens of megabytes of stdout, tool output, and chain-of-thought per session. If I pump that into the supervisor's context, it becomes a bloated, expensive, unreliable mess in about fifteen minutes.&lt;/p&gt;

&lt;p&gt;So I adopted one principle and protected it fiercely:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The supervisor consumes &lt;strong&gt;structured observations&lt;/strong&gt;, not raw logs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Codex emits an event — a turn starts, a tool is invoked, a question is asked, a task completes, a failure occurs — that event gets normalized into a small structured observation. The supervisor sees things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"awaiting_approval"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"session_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sess_83"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"shell"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Wants to run: npm install"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"risk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[2026-04-22T14:03:18Z][codex][turn=4][tool_call] shell {...2300 more chars...}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full log is still archived for audit. The supervisor just doesn't read it by default. This is the single architectural decision with the biggest impact on latency, cost, and correctness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory needs scope, not just storage
&lt;/h2&gt;

&lt;p&gt;The other thing I got wrong in my first draft was memory. I had two levels — "session" and "global" — and within a week they were both the wrong size for every real use case.&lt;/p&gt;

&lt;p&gt;What I have now is four scopes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;global user&lt;/code&gt; — preferences that cross every project ("I prefer TypeScript over JavaScript")&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;workspace / project&lt;/code&gt; — conventions for this codebase ("tests live under &lt;code&gt;tests/unit/&lt;/code&gt;")&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;conversation&lt;/code&gt; — the current chat thread ("we're iterating on the auth middleware")&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;runtime session&lt;/code&gt; — the specific Codex or Claude Code run ("already approved npm install in this session")&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each memory write has to declare its scope. Each read filters by scope. A preference written at &lt;code&gt;conversation&lt;/code&gt; scope in a Telegram chat doesn't leak into a totally unrelated Feishu conversation, even though they share the same user.&lt;/p&gt;

&lt;p&gt;This sounds obvious written down. It was not obvious when I started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Direct runtime vs assistant — don't hijack the default
&lt;/h2&gt;

&lt;p&gt;The other thing I was careful about: &lt;strong&gt;not making every message go through the supervisor.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the user is mid-flow with Codex, they don't want a chatty middleman interrupting every turn with observations and summaries. So the default behavior for plain messages is still &lt;em&gt;direct runtime path&lt;/em&gt; — the message goes straight to the current session, the supervisor does not intervene.&lt;/p&gt;

&lt;p&gt;The supervisor only takes over when the user explicitly invokes it. &lt;code&gt;/cligate do X&lt;/code&gt; or a dedicated assistant chat tab. Low-latency, low-noise, predictable.&lt;/p&gt;

&lt;p&gt;The result is that you get two modes in one product:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct Runtime&lt;/strong&gt; — fast, predictable, feels like talking to Codex or Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assistant Collaboration&lt;/strong&gt; — explicit, structured, feels like talking to a supervisor who then delegates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users can tell the difference instantly, because one is immediate and the other shows a planning step.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this freed me from
&lt;/h2&gt;

&lt;p&gt;The moment I committed to this split, a long list of problems disappeared:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I no longer needed to reinvent tool-use primitives for file editing and shell commands&lt;/li&gt;
&lt;li&gt;I no longer had to ship security sandboxing for the agent itself — the executors already have it&lt;/li&gt;
&lt;li&gt;I no longer had to match Claude Code or Codex on coding quality&lt;/li&gt;
&lt;li&gt;I could ship a useful supervisor in a week, not a quarter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The supervisor's job is narrow enough to be &lt;em&gt;finishable&lt;/em&gt;. The coding agent's job is not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local-first part matters here
&lt;/h2&gt;

&lt;p&gt;All of this runs on &lt;code&gt;localhost&lt;/code&gt;. The supervisor, the executors, the memory store, the channel providers — none of it phones home. That's important to me because a supervisor that manages my credentials, remembers my preferences, and dispatches to my coding tools is &lt;em&gt;exactly&lt;/em&gt; the kind of component I do not want living on someone else's server.&lt;/p&gt;

&lt;p&gt;Local-first also means the supervisor can observe the executors directly, without routing through anyone's cloud. No round trips, no rate limits on the control plane itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;code&gt;http://localhost:8081&lt;/code&gt;. Normal messages still go to Codex / Claude Code directly. Invoke the supervisor explicitly when you want dispatch and memory behavior.&lt;/p&gt;

&lt;p&gt;Repo: &lt;code&gt;https://github.com/codeking-ai/cligate&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The question I keep asking myself
&lt;/h2&gt;

&lt;p&gt;Everyone is building agents that can do more. I spent the last two weeks building one that does less — on purpose — because the thing it does less of is already done better by two other tools I have open in the next terminal tab.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is "supervisor over existing executors" a more honest shape for an agent than "re-implement everything inside a single loop"?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I genuinely don't know the answer across the industry. But for my setup, it's already a clear yes. I'd like to hear how you draw the line — are you putting everything inside one agent, or are you also splitting control plane from execution plane? And if you're splitting, where does your line fall?&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>"I Only Trusted My Channel Abstraction After Plugging In the Third Provider"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Wed, 22 Apr 2026 01:56:10 +0000</pubDate>
      <link>https://dev.to/codekingai/i-only-trusted-my-channel-abstraction-after-plugging-in-the-third-provider-ned</link>
      <guid>https://dev.to/codekingai/i-only-trusted-my-channel-abstraction-after-plugging-in-the-third-provider-ned</guid>
      <description>&lt;p&gt;There is a quiet rule a lot of us follow: &lt;strong&gt;don't abstract until the third use case&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;One integration is a script. Two integrations is copy-paste with a shared helper. By the third, you find out whether you actually built an abstraction — or whether your first two just agreed on the same shape by accident.&lt;/p&gt;

&lt;p&gt;I hit that moment last weekend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;My open-source project runs as a local gateway for AI coding tools — Claude Code, Codex CLI, Gemini CLI — and it also accepts mobile input from messaging channels. Telegram was the first channel. Feishu followed a few weeks later. Both went fine.&lt;/p&gt;

&lt;p&gt;Then someone asked for DingTalk.&lt;/p&gt;

&lt;p&gt;That is the specific moment that tests you. I had two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the Feishu provider, rename everything, and hope&lt;/li&gt;
&lt;li&gt;Look at what the first two shared, decide whether it was actually a pattern, and either harden it or tear it out&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Option 1 always looks cheaper on a Saturday morning. It almost always isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I was worried about
&lt;/h2&gt;

&lt;p&gt;When I looked closely at the existing code, I found two issues that a third provider would inherit by copy-paste — and I did not want to spread them further:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. A safety flag that looked enforced, but wasn't.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The channel settings already had a &lt;code&gt;requirePairing&lt;/code&gt; toggle. The dashboard showed it. The API stored it. But the inbound router was reading a static constructor flag, not the active per-channel setting.&lt;/p&gt;

&lt;p&gt;So it &lt;em&gt;looked&lt;/em&gt; like a security boundary. In practice, if you flipped the setting after start, nothing happened. Adding DingTalk as-is would have shipped this same gap into a new surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Runtime sessions dying without a memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each inbound channel message starts or continues a &lt;em&gt;runtime session&lt;/em&gt; — basically a live bridge to a Codex or Claude Code run. These sessions expire. Messages don't.&lt;/p&gt;

&lt;p&gt;If the user had a conversation going ("now add rate limiting", "no, wrap it in try/except instead"), and the runtime session timed out in between, the next message on the same thread would silently fall back to the channel default provider. No memory of which task they had been iterating on. From the user's perspective, the bot just got dumber for no reason.&lt;/p&gt;

&lt;p&gt;Two channels could mask this. Three would turn it into a pattern users would start noticing across the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fixing the abstraction before adding the third integration
&lt;/h2&gt;

&lt;p&gt;I ended up splitting the work in three phases, and doing them in order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — safety and registry groundwork.&lt;/strong&gt; Move &lt;code&gt;requirePairing&lt;/code&gt; out of the provider constructor and into the active-settings path on every inbound request. Each provider passes its own live settings into &lt;code&gt;routeInboundMessage(message, options)&lt;/code&gt;. This is boring plumbing, but it is the kind of boring that prevents a future incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — DingTalk provider.&lt;/strong&gt; Text-in, text-out. No interactive cards. No button callbacks. Just enough to validate that the router, orchestrator, and outbound dispatcher pipelines are really channel-agnostic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 — dashboard evolution.&lt;/strong&gt; The current dashboard has hard-coded cards for Telegram and Feishu. Rather than add a third hard-coded card, expose provider metadata (&lt;code&gt;id&lt;/code&gt;, &lt;code&gt;label&lt;/code&gt;, &lt;code&gt;capabilities&lt;/code&gt;, &lt;code&gt;configFields&lt;/code&gt;) from the backend and plan to render the cards from that. This is the part I did &lt;em&gt;not&lt;/em&gt; finish in one sitting — it's the kind of change that's easier to do once you already have three providers pulling on the abstraction from different angles.&lt;/p&gt;

&lt;p&gt;The rule I gave myself: &lt;strong&gt;no new provider may duplicate a shape the first two had already imperfectly shared.&lt;/strong&gt; If I caught myself writing the same code a third time, that was the signal to extract.&lt;/p&gt;

&lt;h2&gt;
  
  
  The detail I'm most proud of: the supervisor brief
&lt;/h2&gt;

&lt;p&gt;This is the part I care about more than the channel count.&lt;/p&gt;

&lt;p&gt;I didn't want channel conversations to act like stateless webhook bots. So the orchestrator keeps a small structured record per channel conversation — I call it the &lt;em&gt;supervisor brief&lt;/em&gt;. It holds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the last task the user started&lt;/li&gt;
&lt;li&gt;whether it's waiting for approval or user input&lt;/li&gt;
&lt;li&gt;the runtime provider that owned it (Codex or Claude Code)&lt;/li&gt;
&lt;li&gt;remembered permissions at session or conversation scope&lt;/li&gt;
&lt;li&gt;the origin relationship when a task was spun off from a previous one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, when a message comes in, I don't immediately forward it as a new runtime prompt. I match it against intent patterns first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;进展如何&lt;/code&gt; / &lt;code&gt;status&lt;/code&gt; / &lt;code&gt;done?&lt;/code&gt; → answer from the brief, don't forward&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;总结一下&lt;/code&gt; / &lt;code&gt;summarize&lt;/code&gt; / &lt;code&gt;recap&lt;/code&gt; → wrap-up from the brief&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;再加一个&lt;/code&gt; / &lt;code&gt;把…改成…&lt;/code&gt; → keep the same session, treat as an update&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;基于刚才那个再做一个&lt;/code&gt; → sibling task, keep the provider&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;开始新任务：…&lt;/code&gt; / &lt;code&gt;start a new task&lt;/code&gt; → fresh task, new runtime session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;重试刚才那个&lt;/code&gt; / &lt;code&gt;retry that&lt;/code&gt; → recover the failed task if the brief makes the target explicit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important piece is what happens when the runtime session is already gone but the brief is still there. High-confidence follow-up phrases can &lt;em&gt;revive&lt;/em&gt; the remembered provider, so the user keeps talking to the same tool instead of silently falling through to the channel default. When that happens, CliGate also writes the origin relationship back into the current task memory, so later status queries and wrap-ups can explain which earlier task this run came from.&lt;/p&gt;

&lt;p&gt;Once that existed, wrap-up replies, next-step suggestions, and busy-state explanations all pulled from the same structured brief instead of ad-hoc string logic. One place to reason about. One place to fix bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned from the third provider
&lt;/h2&gt;

&lt;p&gt;A few things crystallized that I'd been half-believing for months:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Thin provider metadata beats thick provider classes.&lt;/strong&gt; &lt;code&gt;{ id, label, capabilities, configFields }&lt;/code&gt; is a surprisingly useful contract. Anything richer tends to calcify.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security flags that live in the wrong layer are worse than missing flags.&lt;/strong&gt; A flag the user trusts but the code ignores is a deception, not a feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A runtime session and a conversation are not the same lifetime.&lt;/strong&gt; Treating them as the same was the single biggest source of "the bot got dumb" bug reports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The third integration is where your abstraction either holds or falls apart.&lt;/strong&gt; If the third one hurts more than the second one, your first two were just twins, not a pattern.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The DingTalk provider itself ended up being one of the smaller PRs in the project. The work that made it small happened &lt;em&gt;before&lt;/em&gt; the file was created.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open &lt;code&gt;http://localhost:8081&lt;/code&gt;, go to the &lt;strong&gt;Channels&lt;/strong&gt; tab, and plug in Telegram, Feishu, or DingTalk. The same runtime session behavior applies across all three.&lt;/p&gt;

&lt;p&gt;Repo: &lt;code&gt;https://github.com/codeking-ai/cligate&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Over to you
&lt;/h2&gt;

&lt;p&gt;I'm curious how other people decide when to abstract. Do you wait for the third use case like me? Do you go earlier and accept the rework risk? Or do you just never abstract until someone files a bug that forces your hand?&lt;/p&gt;

&lt;p&gt;I'd genuinely like to hear how your team handles this — especially for features that &lt;em&gt;look&lt;/em&gt; similar but have quietly different lifetimes, like runtime sessions versus channel conversations.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Wanted One Local Gateway for Claude Code, Codex, Gemini, Telegram, Feishu, and DingTalk. So I Built CliGate</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Tue, 21 Apr 2026 02:42:27 +0000</pubDate>
      <link>https://dev.to/codekingai/i-wanted-one-local-gateway-for-claude-code-codex-gemini-telegram-feishu-and-dingtalk-so-i-i83</link>
      <guid>https://dev.to/codekingai/i-wanted-one-local-gateway-for-claude-code-codex-gemini-telegram-feishu-and-dingtalk-so-i-i83</guid>
      <description>&lt;p&gt;Most AI dev setups break down in exactly the same place: the layer between your tools and your providers.&lt;/p&gt;

&lt;p&gt;You may have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code on one account&lt;/li&gt;
&lt;li&gt;Codex using a different auth path&lt;/li&gt;
&lt;li&gt;Gemini CLI speaking another protocol&lt;/li&gt;
&lt;li&gt;a few API keys across multiple vendors&lt;/li&gt;
&lt;li&gt;mobile messages coming from Telegram, Feishu, or DingTalk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, the problem is no longer "which model should I use?"&lt;/p&gt;

&lt;p&gt;The problem is that your workflow has no control plane.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;CliGate&lt;/strong&gt;: a &lt;strong&gt;local multi-protocol AI gateway&lt;/strong&gt; that runs on &lt;code&gt;localhost&lt;/code&gt; and gives all of those clients one entry point.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;I did not want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;separate configs for every CLI&lt;/li&gt;
&lt;li&gt;separate auth handling for every provider&lt;/li&gt;
&lt;li&gt;separate debugging surfaces for web chat and mobile channels&lt;/li&gt;
&lt;li&gt;separate session logic for "real work" versus "messages from my phone"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted one local layer that could do all of this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;accept requests from different AI coding tools&lt;/li&gt;
&lt;li&gt;route them to different upstream providers or account pools&lt;/li&gt;
&lt;li&gt;keep visibility into usage, logs, pricing, and failures&lt;/li&gt;
&lt;li&gt;let mobile channels continue the same runtime flow instead of becoming a dead-end notification pipe&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is what CliGate does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CliGate supports
&lt;/h2&gt;

&lt;p&gt;On the client side, CliGate already exposes compatible paths for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; through Anthropic Messages API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; through OpenAI Responses API, Chat Completions, and Codex internal endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; through Gemini-compatible routes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenClaw&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the channel side, it now supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Telegram&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feishu&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DingTalk&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the upstream side, it can route through combinations of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ChatGPT account pools&lt;/li&gt;
&lt;li&gt;Claude account pools&lt;/li&gt;
&lt;li&gt;Antigravity accounts&lt;/li&gt;
&lt;li&gt;provider API keys&lt;/li&gt;
&lt;li&gt;free-model routes&lt;/li&gt;
&lt;li&gt;local runtimes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the same local service can sit between your tools, your chat channels, and multiple upstream model providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I care about most: channels are not bolted on
&lt;/h2&gt;

&lt;p&gt;This is the distinction that made the project worth building.&lt;/p&gt;

&lt;p&gt;I did not want Telegram, Feishu, or DingTalk to behave like dumb message forwarders.&lt;/p&gt;

&lt;p&gt;In CliGate, channel conversations plug into the same runtime orchestration layer used by the dashboard. That gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;sticky runtime sessions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;conversation records&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pairing and approval flows&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;provider-specific follow-up handling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;one place to inspect what happened&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when a conversation starts from a mobile channel, it can stay attached to the same runtime session until you explicitly reset it.&lt;/p&gt;

&lt;p&gt;That is a very different model from the usual "webhook in, text out" bot architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the local-first approach matters
&lt;/h2&gt;

&lt;p&gt;CliGate runs locally.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no hosted relay layer&lt;/li&gt;
&lt;li&gt;no forced external control plane&lt;/li&gt;
&lt;li&gt;direct connections to official upstream APIs&lt;/li&gt;
&lt;li&gt;your routing, credentials, sessions, and logs stay under your control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developer tooling, this matters a lot more than people admit.&lt;/p&gt;

&lt;p&gt;If the gateway layer itself becomes another cloud dependency, you have just moved the fragility somewhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing is where the mess gets cleaned up
&lt;/h2&gt;

&lt;p&gt;CliGate separates the &lt;strong&gt;client protocol&lt;/strong&gt; from the &lt;strong&gt;upstream provider&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Your tool sends the shape it already expects. CliGate decides where it should actually go.&lt;/p&gt;

&lt;p&gt;That includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;routing priority between account pools and API keys&lt;/li&gt;
&lt;li&gt;per-app assignments&lt;/li&gt;
&lt;li&gt;model mapping&lt;/li&gt;
&lt;li&gt;free-model fallback&lt;/li&gt;
&lt;li&gt;local model routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So Claude Code, Codex CLI, Gemini CLI, and OpenClaw do not need to share the same credentials, and they do not need to know anything about each other's protocol requirements.&lt;/p&gt;

&lt;p&gt;You can also bind apps to specific targets instead of manually swapping environment variables every time your usage pattern changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dashboard is part of the product, not an afterthought
&lt;/h2&gt;

&lt;p&gt;Most proxy tools feel fine until something breaks.&lt;/p&gt;

&lt;p&gt;Then you realize there is no real visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which credential was selected&lt;/li&gt;
&lt;li&gt;why routing chose that path&lt;/li&gt;
&lt;li&gt;whether a token expired&lt;/li&gt;
&lt;li&gt;which conversation owns a runtime session&lt;/li&gt;
&lt;li&gt;where a mobile follow-up got attached&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CliGate ships with a web dashboard to manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accounts&lt;/li&gt;
&lt;li&gt;API keys&lt;/li&gt;
&lt;li&gt;app routing&lt;/li&gt;
&lt;li&gt;channel settings&lt;/li&gt;
&lt;li&gt;runtime providers&lt;/li&gt;
&lt;li&gt;conversation records&lt;/li&gt;
&lt;li&gt;request logs&lt;/li&gt;
&lt;li&gt;usage and cost stats&lt;/li&gt;
&lt;li&gt;pricing overrides&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because a gateway without observability eventually becomes guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example
&lt;/h2&gt;

&lt;p&gt;This is the workflow I wanted to make normal:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run CliGate once on my machine.&lt;/li&gt;
&lt;li&gt;Point Claude Code, Codex CLI, and Gemini CLI at the same local gateway.&lt;/li&gt;
&lt;li&gt;Configure Telegram, Feishu, or DingTalk as channel entry points.&lt;/li&gt;
&lt;li&gt;Start a task from the dashboard or from a mobile message.&lt;/li&gt;
&lt;li&gt;Keep that conversation attached to the same runtime context while I continue from another surface.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In other words: not just "multiple clients can call one proxy", but "multiple surfaces can participate in the same local orchestration model."&lt;/p&gt;

&lt;p&gt;That is the real product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; cligate
cligate start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add accounts or API keys&lt;/li&gt;
&lt;li&gt;configure app routing&lt;/li&gt;
&lt;li&gt;enable Telegram / Feishu / DingTalk channels&lt;/li&gt;
&lt;li&gt;inspect runtime sessions and conversation records&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;CliGate is useful if you are already feeling pain from any of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you use more than one AI coding CLI&lt;/li&gt;
&lt;li&gt;you switch across OpenAI, Anthropic, Gemini, and other providers&lt;/li&gt;
&lt;li&gt;you want one local place to manage auth and routing&lt;/li&gt;
&lt;li&gt;you want mobile channel access without giving up runtime continuity&lt;/li&gt;
&lt;li&gt;you want debugging and observability instead of shell-script chaos&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Repo
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;code&gt;https://github.com/codeking-ai/cligate&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If your current AI setup looks like a pile of disconnected clients, credentials, and chat surfaces, CliGate is meant to turn that into one local piece of infrastructure.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>devtools</category>
      <category>programming</category>
    </item>
    <item>
      <title>"How I Control Codex and Claude Code From Telegram — a 5-Minute Setup"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 20 Apr 2026 05:45:15 +0000</pubDate>
      <link>https://dev.to/codekingai/how-i-control-codex-and-claude-code-from-telegram-a-5-minute-setup-520c</link>
      <guid>https://dev.to/codekingai/how-i-control-codex-and-claude-code-from-telegram-a-5-minute-setup-520c</guid>
      <description>&lt;p&gt;I was at dinner when a colleague pinged me: "the staging deploy is failing, can you check the test suite?"&lt;/p&gt;

&lt;p&gt;I didn't have my laptop. I had my phone and a Telegram bot connected to my dev machine.&lt;/p&gt;

&lt;p&gt;I typed: &lt;code&gt;/cx fix the failing test in tests/auth.test.js&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Codex started running on my desktop. Two minutes later, my phone buzzed: "Task completed. Fixed assertion in auth.test.js line 42 — expected token format was outdated."&lt;/p&gt;

&lt;p&gt;I went back to dinner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's exactly how to set this up in 5 minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; running on your machine (&lt;code&gt;npx cligate@latest start&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Codex CLI or Claude Code installed (CliGate's Tool Installer tab can do this for you)&lt;/li&gt;
&lt;li&gt;A Telegram account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No cloud server. No public IP. No ngrok.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Create a Telegram Bot (1 minute)
&lt;/h2&gt;

&lt;p&gt;Open Telegram, search for &lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/botfather"&gt;@botfather&lt;/a&gt;&lt;/strong&gt;, and send:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/newbot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Give it a name and username. BotFather gives you a token like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7123456789:AAH1234abcdefghijklmnopqrstuvwxyz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy that token.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Configure CliGate Channels (1 minute)
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;http://localhost:8081&lt;/code&gt; and go to the &lt;strong&gt;Channels&lt;/strong&gt; tab.&lt;/p&gt;

&lt;p&gt;Under &lt;strong&gt;Telegram&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Paste your bot token&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;Default Runtime Provider&lt;/strong&gt; to &lt;code&gt;codex&lt;/code&gt; (or &lt;code&gt;claude-code&lt;/code&gt; — your preference)&lt;/li&gt;
&lt;li&gt;Set &lt;strong&gt;Working Directory&lt;/strong&gt; to your project path, e.g. &lt;code&gt;/home/you/projects/my-app&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Toggle &lt;strong&gt;Enabled&lt;/strong&gt; on&lt;/li&gt;
&lt;li&gt;Click Save&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CliGate starts polling Telegram immediately. No webhook URL needed — it uses long-polling mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Pair Your Phone (30 seconds)
&lt;/h2&gt;

&lt;p&gt;Open your Telegram bot and send any message, like "hello".&lt;/p&gt;

&lt;p&gt;The bot responds with a pairing code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Pairing required. Code: 847291
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go back to the CliGate dashboard. Enter the pairing code in the Channels tab. Done — your Telegram account is now authorized.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Send Your First Task (30 seconds)
&lt;/h2&gt;

&lt;p&gt;Now the fun part. Send a message to your bot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cx analyze the error handling in src/server.js and suggest improvements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Here's what happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CliGate receives the message&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cx&lt;/code&gt; tells the supervisor to use &lt;strong&gt;Codex&lt;/strong&gt; as the runtime&lt;/li&gt;
&lt;li&gt;Codex spawns on your desktop in headless mode&lt;/li&gt;
&lt;li&gt;Events stream back to Telegram: progress, commands, file changes&lt;/li&gt;
&lt;li&gt;When Codex finishes, you get a summary in Telegram&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Want Claude Code instead? Use &lt;code&gt;/cc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/cc refactor the database connection pool in src/db.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Commands You Actually Need
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/cx &amp;lt;task&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a Codex session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/cc &amp;lt;task&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a Claude Code session&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/new&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detach current session, next message starts fresh&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/new cx &amp;lt;task&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a new Codex session immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/new cc &amp;lt;task&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Start a new Claude Code session immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/cancel&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stop the running task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(plain text)&lt;/td&gt;
&lt;td&gt;Continue the current session — follow-up messages stay attached&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You don't need to prefix every message with &lt;code&gt;/cx&lt;/code&gt;. After starting a session, plain follow-up messages go to the same agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:  /cx fix the failing tests
Bot:  Task accepted. Session abc123 started with Codex.
Bot:  [... progress events ...]
Bot:  Task completed. Fixed 3 assertions.

You:  also update the test snapshots
Bot:  Sent follow-up to session abc123.
Bot:  [... continues in the same session ...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Part I Didn't Expect: It Understands What You Mean
&lt;/h2&gt;

&lt;p&gt;This isn't a dumb message relay. CliGate has a supervisor layer that understands intent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Status queries:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:  status?
Bot:  Current task "fix auth tests" running with Codex. 
      2 files changed, 1 command executed.
      You can wait for completion or send /cancel.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It recognizes "status", "progress", "done?", "进展如何", "做到哪了" — and answers from the remembered task state instead of forwarding your question to Codex as a new prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrap-up summaries:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:  summarize
Bot:  Task "fix auth tests" completed with Codex.
      Changed: tests/auth.test.js (updated 3 assertions)
      You can ask for a revision or start a related task.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Task switching:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:  start a new task: add input validation to the signup endpoint
Bot:  Task accepted. Session def456 started with Codex.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It knows "start a new task" means detach the current session and begin fresh — without you needing &lt;code&gt;/new&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approval Bridging — the Killer Feature
&lt;/h2&gt;

&lt;p&gt;When Claude Code or Codex needs permission to edit a file, the request shows up in Telegram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bot:  🔒 Codex wants to run: npm test
      [Approve]  [Deny]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tap &lt;strong&gt;Approve&lt;/strong&gt;. The agent continues.&lt;/p&gt;

&lt;p&gt;But here's the clever part: &lt;strong&gt;CliGate remembers your approval.&lt;/strong&gt; If you approve editing files in &lt;code&gt;/src/&lt;/code&gt;, future requests for files in that same directory get auto-approved within the same session. No more tapping "Approve" twenty times for twenty files in the same folder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Also Works With Feishu (飞书)
&lt;/h2&gt;

&lt;p&gt;If your team uses Feishu instead of Telegram, CliGate supports it too.&lt;/p&gt;

&lt;p&gt;The difference: Feishu can run in &lt;strong&gt;WebSocket mode&lt;/strong&gt; — meaning it works on your local machine without a public URL. No ngrok, no cloud, no firewall config. Set Feishu Open Platform event subscription to persistent connection mode, and CliGate connects directly.&lt;/p&gt;

&lt;p&gt;Same commands, same supervisor intelligence, same approval bridging.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Architecture Looks Like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Phone (Telegram / Feishu)
         │
         ▼
  Channel Gateway (long-polling / WebSocket)
         │
         ▼
  Supervisor Agent Layer
    ├── Intent detection (new task / follow-up / status / wrap-up)
    ├── Approval policy engine (remembers scoped permissions)
    └── Task memory (structured brief per conversation)
         │
         ▼
  Agent Runtime (session manager)
    ├── Codex  (headless JSONL events)
    └── Claude Code  (stream-json protocol)
         │
         ▼
  CliGate Proxy Core → Upstream AI APIs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your phone sends text. The supervisor figures out what to do. The runtime executes. Results come back to your phone. The proxy handles all the API routing underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Your desktop machine needs to be running for this to work (it's localhost, not cloud)&lt;/li&gt;
&lt;li&gt;Long-running tasks can time out if your machine sleeps&lt;/li&gt;
&lt;li&gt;Feishu WebSocket mode requires a Feishu developer app (free to create, but takes 5 more minutes)&lt;/li&gt;
&lt;li&gt;Multi-step tasks with lots of approval requests work better with the web dashboard than Telegram&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;localhost:8081&lt;/code&gt; → Channels tab → add your Telegram bot token → pair your phone → send &lt;code&gt;/cx hello world&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's the whole setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your remote development workflow?&lt;/strong&gt; Do you SSH from your phone, use VS Code remote, or just wait until you're back at your desk? I'm curious how others handle the "not at my computer but need to fix something" problem.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, Google, or Telegram.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>tutorial</category>
      <category>ai</category>
    </item>
    <item>
      <title>"I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Fri, 17 Apr 2026 01:58:00 +0000</pubDate>
      <link>https://dev.to/codekingai/i-texted-my-localhost-from-the-train-claude-code-fixed-the-bug-before-i-got-home-5eo7</link>
      <guid>https://dev.to/codekingai/i-texted-my-localhost-from-the-train-claude-code-fixed-the-bug-before-i-got-home-5eo7</guid>
      <description>&lt;p&gt;Last Tuesday I was on the train home when a Slack message came in: "prod build is broken, can you look?"&lt;/p&gt;

&lt;p&gt;I didn't have my laptop open. I didn't want to SSH from my phone. But I had something else — a Telegram bot connected to my localhost machine at home.&lt;/p&gt;

&lt;p&gt;I typed: "launch claude code in ~/projects/api-server, fix the failing build"&lt;/p&gt;

&lt;p&gt;By the time I walked through my front door, the fix was committed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's not how localhost is supposed to work. But here we are.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea That Sounded Crazy
&lt;/h2&gt;

&lt;p&gt;For months, &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; was "just" a proxy — it sat between your AI coding tools and their APIs, handling routing, account pooling, and key management.&lt;/p&gt;

&lt;p&gt;But every time I used the built-in chat to test credentials, the same thought kept nagging me:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Why am I testing models in this chat window, then switching to a terminal to actually use Claude Code or Codex?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What if the chat window could just... launch them?&lt;/p&gt;

&lt;p&gt;And then the scarier thought:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What if I didn't even need to be at my computer?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed: Two New Layers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1: Agent Runtime — Your Chat Window Becomes a Control Room
&lt;/h3&gt;

&lt;p&gt;CliGate's chat can now spawn Claude Code or Codex as real background processes.&lt;/p&gt;

&lt;p&gt;Not simulated. Not a wrapper around an API call. The actual CLI tools, running headless, streaming structured events back into your browser.&lt;/p&gt;

&lt;p&gt;Here's how it works under the hood:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Codex:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;--experimental-json&lt;/span&gt; &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5 &lt;span class="s2"&gt;"fix the failing test"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CliGate spawns this as a child process, reads the JSONL event stream, and maps every event — &lt;code&gt;agent_message&lt;/code&gt;, &lt;code&gt;command_execution&lt;/code&gt;, &lt;code&gt;file_change&lt;/code&gt;, &lt;code&gt;todo_list&lt;/code&gt;, &lt;code&gt;reasoning&lt;/code&gt; — into the chat UI in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Claude Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--output-format&lt;/span&gt; stream-json &lt;span class="nt"&gt;--input-format&lt;/span&gt; stream-json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same idea. Claude Code's headless mode exposes a structured stdin/stdout protocol. CliGate reads it, bridges it, and surfaces everything in the chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You Actually See
&lt;/h3&gt;

&lt;p&gt;When you tell CliGate's chat "use codex to refactor the auth module":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A session starts — you see &lt;code&gt;session abc123 started with codex&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Codex thinks — reasoning events stream in&lt;/li&gt;
&lt;li&gt;Codex runs commands — you see the actual shell commands and their output&lt;/li&gt;
&lt;li&gt;Codex changes files — you see diffs&lt;/li&gt;
&lt;li&gt;Codex finishes — you get a summary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The killer feature: permission bridging.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Claude Code asks "Can I edit &lt;code&gt;server.js&lt;/code&gt;?" — that question doesn't disappear into a terminal you're not watching. It pops up in the chat. You click Approve or Deny. Claude Code continues.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session status flow:

starting → running → waiting_approval → running → completed
                          ↑
                    You approve here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means you don't need a terminal window open at all. The chat window IS your terminal now — but one that actually understands what the agent is doing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Channel Gateway — Your Phone Becomes the Remote Control
&lt;/h3&gt;

&lt;p&gt;This is where it gets wild.&lt;/p&gt;

&lt;p&gt;CliGate now has a &lt;strong&gt;Channel Gateway&lt;/strong&gt; that connects external messaging platforms to the Agent Runtime. Currently supported:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telegram&lt;/strong&gt; (polling mode)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feishu / Lark&lt;/strong&gt; (webhook mode)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your Phone (Telegram / Feishu)
        ↓
  Channel Gateway
        ↓
  Agent Runtime (Orchestrator)
        ↓
  Codex / Claude Code (child process)
        ↓
  CliGate Proxy Core
        ↓
  Upstream AI Models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You text your Telegram bot. The Channel Gateway receives the message, routes it to the orchestrator, which decides whether to start a new Codex/Claude Code session or continue an existing one. Results stream back to your phone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pairing for security:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You don't want random people controlling your localhost. So there's a pairing flow — the first time you message the bot, it gives you a code. Enter that code in the CliGate dashboard. Now your Telegram account is paired and authorized.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approval buttons:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When Claude Code needs permission, you get an inline button in Telegram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔒 Claude Code wants to edit server.js

[Approve]  [Deny]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tap Approve. Done. Claude Code continues — on your desktop machine — while you're standing in line at a coffee shop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Talk: What This Actually Solves
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Long-running tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You tell Claude Code to analyze a large codebase. It takes 20 minutes. Without this feature, you're staring at a terminal for 20 minutes. With it, you get a notification on your phone when it's done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Permission fatigue&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code asks for permission constantly. If you're not watching the terminal, it just... sits there. Now permission requests reach you wherever you are — browser, Telegram, Feishu.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3: Context switching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You're in a meeting. A build breaks. You text your bot: "launch codex in ~/projects/backend, fix the test in auth.test.js". You go back to your meeting. Codex handles it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Didn't Build (On Purpose)
&lt;/h2&gt;

&lt;p&gt;This is NOT a full web clone of Claude Code's TUI. It's NOT a complete Codex terminal emulator.&lt;/p&gt;

&lt;p&gt;CliGate doesn't try to replicate every feature of these tools. It does exactly four things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start&lt;/strong&gt; a session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt; progress in real time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bridge&lt;/strong&gt; permission requests and questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume&lt;/strong&gt; or continue a conversation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The actual coding work is still done by Codex and Claude Code. CliGate is the orchestration layer — the thing that lets you interact with them without sitting in front of a terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;If you already have CliGate running:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Runtime works out of the box&lt;/strong&gt; — just use the chat window and mention codex or claude code in your message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For Telegram:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a bot via &lt;a class="mentioned-user" href="https://dev.to/botfather"&gt;@botfather&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add your bot token in CliGate's Channel settings&lt;/li&gt;
&lt;li&gt;Message your bot — it'll ask you to pair&lt;/li&gt;
&lt;li&gt;Enter the pairing code in the dashboard&lt;/li&gt;
&lt;li&gt;Start sending tasks&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;For Feishu:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a custom app in Feishu's developer console&lt;/li&gt;
&lt;li&gt;Add App ID, App Secret, and Verification Token in Channel settings&lt;/li&gt;
&lt;li&gt;Set the webhook URL to your CliGate instance&lt;/li&gt;
&lt;li&gt;Same pairing flow&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Honest "Is This Production Ready?" Answer
&lt;/h2&gt;

&lt;p&gt;No. It's early.&lt;/p&gt;

&lt;p&gt;The Agent Runtime is solid for single-session workflows. The Channel Gateway handles Telegram well. Feishu needs more testing.&lt;/p&gt;

&lt;p&gt;What's missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-turn conversations across long time windows need more state management&lt;/li&gt;
&lt;li&gt;File attachments from channels aren't supported yet&lt;/li&gt;
&lt;li&gt;Error recovery from crashed sessions could be more graceful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But for the "text your computer to fix a bug" workflow? It works. I use it daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Your Remote Development Setup?
&lt;/h2&gt;

&lt;p&gt;I'm curious about how others handle this problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you SSH from your phone?&lt;/li&gt;
&lt;li&gt;Do you use VS Code's remote features?&lt;/li&gt;
&lt;li&gt;Have you tried controlling AI coding agents remotely?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea of "your desktop is a server, your phone is the client" feels like it's going to be a bigger pattern. I'd love to hear how others approach it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, or Google.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>ai</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
    <item>
      <title>"How Do You Manage 4 AI Coding Tools at Once? Here's My Setup"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Thu, 16 Apr 2026 02:08:52 +0000</pubDate>
      <link>https://dev.to/codekingai/how-do-you-manage-4-ai-coding-tools-at-once-heres-my-setup-3j1</link>
      <guid>https://dev.to/codekingai/how-do-you-manage-4-ai-coding-tools-at-once-heres-my-setup-3j1</guid>
      <description>&lt;p&gt;I didn't plan to use four AI coding tools.&lt;/p&gt;

&lt;p&gt;It started with Claude Code. Then Codex CLI dropped, and it was good enough that I had to try it. Then Gemini CLI became free. Then a friend told me about OpenClaw and its custom provider injection.&lt;/p&gt;

&lt;p&gt;Before I realized it, I had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4 different CLIs&lt;/li&gt;
&lt;li&gt;3 different API key formats&lt;/li&gt;
&lt;li&gt;2 ChatGPT accounts&lt;/li&gt;
&lt;li&gt;1 Claude account&lt;/li&gt;
&lt;li&gt;An Azure OpenAI endpoint from work&lt;/li&gt;
&lt;li&gt;A Gemini API key from a free tier&lt;/li&gt;
&lt;li&gt;And a growing dread every time I opened a new terminal tab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Does anyone else live like this?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Config File Graveyard
&lt;/h2&gt;

&lt;p&gt;Here's what my config situation looked like before I snapped:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code&lt;/strong&gt; wanted &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; and &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; in my environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex CLI&lt;/strong&gt; wanted a &lt;code&gt;~/.codex/config.toml&lt;/code&gt; with &lt;code&gt;chatgpt_base_url&lt;/code&gt; and &lt;code&gt;openai_base_url&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; wanted... something patched into its internals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenClaw&lt;/strong&gt; wanted a &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; with its own provider format.&lt;/p&gt;

&lt;p&gt;Four tools. Four config formats. Four places to update when a key expires. And if I wanted to switch which account goes where? Manual surgery.&lt;/p&gt;

&lt;p&gt;I tried maintaining this by hand for about two weeks before I lost it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Did Instead
&lt;/h2&gt;

&lt;p&gt;I pointed all four tools at &lt;code&gt;localhost:8081&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's it. That's the setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt; is an open-source local gateway that sits between your AI tools and their APIs. Every tool talks to the same address. The gateway figures out who sent the request, what model they need, and which credential to use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One command. Dashboard opens. All my accounts and keys live in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part That Actually Matters: Routing
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting.&lt;/p&gt;

&lt;p&gt;I don't want Codex using the same account as Claude Code. Codex hammers the API with rapid-fire completions. Claude Code takes longer, deeper passes. Mixing them on the same account burns through rate limits fast.&lt;/p&gt;

&lt;p&gt;So I set up &lt;strong&gt;App Routing&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; → My Claude account (PKCE OAuth, auto-refreshing tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; → Azure OpenAI endpoint (fastest, corporate budget)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; → Google Gemini API key (free tier — why pay?)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenClaw&lt;/strong&gt; → Pool fallback (whatever's available)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each binding has a fallback chain. If Claude's rate-limited, it drops to the API key pool. If Azure is down, Codex falls back to ChatGPT accounts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero manual switching. Zero config file editing.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Free Model Trick
&lt;/h2&gt;

&lt;p&gt;Not every request needs GPT-5 or Claude Opus.&lt;/p&gt;

&lt;p&gt;Quick lookups, small code questions, "what does this error mean" — those can go to free models. CliGate has a toggle that routes fast-tier requests (anything that maps to haiku/mini/lite) to free providers like DeepSeek, Qwen, or MiniMax.&lt;/p&gt;

&lt;p&gt;Flip it on. Watch your API costs drop.&lt;/p&gt;

&lt;p&gt;Flip it off when you need the heavy models for complex reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  What My Setup Actually Looks Like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐  ┌───────────┐  ┌────────────┐  ┌──────────┐
│ Claude Code │  │ Codex CLI │  │ Gemini CLI │  │ OpenClaw │
└──────┬──────┘  └─────┬─────┘  └──────┬─────┘  └────┬─────┘
       │               │               │              │
       └───────────────┼───────────────┼──────────────┘
                       ▼
              CliGate (localhost:8081)
                       │
       ┌───────┬───────┼───────┬───────┐
       ▼       ▼       ▼       ▼       ▼
   Anthropic  OpenAI  Azure   Google  Free
     API       API   OpenAI  Gemini  Models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything goes through one gateway. The gateway handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Protocol translation&lt;/strong&gt; — Anthropic format, OpenAI format, Gemini format — doesn't matter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account rotation&lt;/strong&gt; — Multiple ChatGPT/Claude accounts, round-robin or sticky&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key load balancing&lt;/strong&gt; — Spreads requests across API keys, routes to least-used first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token refresh&lt;/strong&gt; — OAuth tokens auto-refresh and sync back to source tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage tracking&lt;/strong&gt; — Per-account, per-model, per-day cost breakdown&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The One-Click Part
&lt;/h2&gt;

&lt;p&gt;Each CLI tool has a "Configure" button in the dashboard. Click it. Done.&lt;/p&gt;

&lt;p&gt;No editing &lt;code&gt;.toml&lt;/code&gt; files. No setting environment variables. No patching Gemini's internals manually.&lt;/p&gt;

&lt;p&gt;The dashboard also installs tools you don't have yet. Don't have Codex CLI? Click "Install." It detects your OS and handles the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Downsides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It's another process running on your machine (Node.js on port 8081)&lt;/li&gt;
&lt;li&gt;Initial setup takes ~5 minutes to add accounts and configure routing&lt;/li&gt;
&lt;li&gt;If you only use one AI tool with one API key, this is overkill&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But if you're juggling 2+ tools or managing multiple accounts? The time savings compound fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  So... What's Your Setup?
&lt;/h2&gt;

&lt;p&gt;I genuinely want to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many AI coding tools are you running right now?&lt;/li&gt;
&lt;li&gt;Are you managing configs manually or have you built some system?&lt;/li&gt;
&lt;li&gt;Has anyone else hit the "too many API keys" wall?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop your setup in the comments. I'm curious if I'm the only one who went down this rabbit hole — or if there's a whole community of us doing the same thing.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CliGate is open-source under AGPL-3.0. Not affiliated with Anthropic, OpenAI, or Google.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
    </item>
    <item>
      <title>"My Company Has Azure OpenAI. My AI Coding Tools Had No Idea What to Do With It."</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Wed, 15 Apr 2026 03:03:11 +0000</pubDate>
      <link>https://dev.to/codekingai/my-company-has-azure-openai-my-ai-coding-tools-had-no-idea-what-to-do-with-it-26ik</link>
      <guid>https://dev.to/codekingai/my-company-has-azure-openai-my-ai-coding-tools-had-no-idea-what-to-do-with-it-26ik</guid>
      <description>&lt;p&gt;My company's Azure OpenAI deployment has been running for eight months. Enterprise-grade security controls, compliance logging, the whole setup. Every team that needs AI API access routes through it.&lt;/p&gt;

&lt;p&gt;Every team except the ones using AI coding tools.&lt;/p&gt;

&lt;p&gt;Claude Code talks Anthropic protocol. Codex CLI talks OpenAI protocol, but to the public endpoint. Azure OpenAI is a different enough target that just pointing the tools at it doesn't work — and the error messages are not helpful when it silently fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Azure OpenAI Different
&lt;/h2&gt;

&lt;p&gt;If you've only used the direct OpenAI or Anthropic APIs, Azure OpenAI looks similar at first glance. It's still a REST API, still returns completions. But the differences compound quickly when you're trying to make a proxy work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Endpoint format is different.&lt;/strong&gt; Instead of &lt;code&gt;api.openai.com&lt;/code&gt;, you have a resource-specific URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://your-resource-name.openai.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Models are replaced by deployments.&lt;/strong&gt; You don't call &lt;code&gt;gpt-4o&lt;/code&gt;. You call a deployment — an instance you created in the Azure portal that points to a model. The deployment name is arbitrary (&lt;code&gt;my-gpt4-deployment&lt;/code&gt;, &lt;code&gt;prod-coding-model&lt;/code&gt;). Your code has to know it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API version is required.&lt;/strong&gt; Every request needs a &lt;code&gt;?api-version=2024-10-21&lt;/code&gt; query parameter (or similar). Miss it and the request fails with a cryptic error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON Schema rules are stricter.&lt;/strong&gt; Azure OpenAI's tool definition validation rejects things the direct OpenAI API accepts — &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;$id&lt;/code&gt;, &lt;code&gt;definitions&lt;/code&gt; fields, &lt;code&gt;const&lt;/code&gt; values. If your tool definitions contain any of these (and Claude Code's do), requests fail silently.&lt;/p&gt;

&lt;p&gt;That last one took me an embarrassingly long time to figure out.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Translation Problem
&lt;/h2&gt;

&lt;p&gt;Claude Code sends requests in Anthropic's Messages API format. Azure OpenAI accepts OpenAI's Responses API format. Between those two surfaces there's:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A message format translation (Anthropic content blocks → OpenAI messages)&lt;/li&gt;
&lt;li&gt;Tool definition translation (Anthropic tool schema → Azure-safe OpenAI tool schema)&lt;/li&gt;
&lt;li&gt;Response translation back (OpenAI completion → Anthropic-format streaming response)&lt;/li&gt;
&lt;li&gt;Schema sanitization that strips the fields Azure rejects and converts &lt;code&gt;const&lt;/code&gt; to &lt;code&gt;enum&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The sanitization step is the one that actually makes things work. Claude Code includes hosted tool definitions with JSON Schema features that Azure's stricter validator rejects. The proxy strips &lt;code&gt;$schema&lt;/code&gt;, &lt;code&gt;$id&lt;/code&gt;, &lt;code&gt;$defs&lt;/code&gt;, &lt;code&gt;$comment&lt;/code&gt;, &lt;code&gt;definitions&lt;/code&gt;, and &lt;code&gt;examples&lt;/code&gt; fields, and converts &lt;code&gt;const: value&lt;/code&gt; to &lt;code&gt;enum: [value]&lt;/code&gt; before forwarding. Azure accepts the result.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting It Up in CliGate
&lt;/h2&gt;

&lt;p&gt;CliGate now supports Azure OpenAI as a native key type. In the API Keys tab, add a new key and select &lt;strong&gt;Azure OpenAI&lt;/strong&gt; as the provider. You'll fill in four fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Key&lt;/strong&gt; — your Azure OpenAI resource key from the Azure portal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Base URL&lt;/strong&gt; — &lt;code&gt;https://your-resource-name.openai.azure.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Name&lt;/strong&gt; — the name you gave your deployment in Azure (e.g. &lt;code&gt;gpt4o-prod&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Version&lt;/strong&gt; — e.g. &lt;code&gt;2024-10-21&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once saved, that key appears in your routing options. You can assign it as the backend for Claude Code, Codex CLI, or the chat UI — or let the router pick it based on priority settings.&lt;/p&gt;

&lt;p&gt;From Claude Code's perspective, nothing changes. You're still hitting &lt;code&gt;localhost:8081&lt;/code&gt; with Anthropic credentials. The proxy handles the translation, the schema cleaning, the deployment name injection, and the API version parameter. The response comes back in valid Anthropic streaming format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Enterprise Teams
&lt;/h2&gt;

&lt;p&gt;The practical upshot: your AI coding tools now route through your company's Azure deployment.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requests flow through your company's network controls and compliance logging&lt;/li&gt;
&lt;li&gt;You're not using personal API keys or personal accounts for work&lt;/li&gt;
&lt;li&gt;Usage appears in your Azure portal dashboards alongside other company AI usage&lt;/li&gt;
&lt;li&gt;The content controls and safety policies your company configured in Azure apply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams where "just use the public API with your personal key" isn't an acceptable answer — because it usually isn't on enterprise projects — this closes a gap that's been annoying for a while.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Thing to Watch
&lt;/h2&gt;

&lt;p&gt;Azure OpenAI deployments have their own rate limits, set at the deployment level in the Azure portal. If you're routing multiple AI coding tools through a single deployment, you can hit those limits quickly during intensive sessions. The proxy handles failover to other keys if you've configured them, but it's worth sizing your deployment quota for the team's expected usage before you roll this out.&lt;/p&gt;




&lt;p&gt;The Azure OpenAI provider in CliGate is part of the open-source release: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're in an enterprise setup and have gotten AI coding tools working through your company's infrastructure — curious how you handled it. Azure, on-prem, something else?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>"How I Route claude-sonnet-4-6 to GPT-5 Codex — Without Claude Code Knowing the Difference"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Tue, 14 Apr 2026 02:37:28 +0000</pubDate>
      <link>https://dev.to/codekingai/how-i-route-claude-sonnet-4-6-to-gpt-5-codex-without-claude-code-knowing-the-difference-48n7</link>
      <guid>https://dev.to/codekingai/how-i-route-claude-sonnet-4-6-to-gpt-5-codex-without-claude-code-knowing-the-difference-48n7</guid>
      <description>&lt;p&gt;Claude Code always sends &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; in the request body. That string goes to whatever base URL you've configured.&lt;/p&gt;

&lt;p&gt;Here's what most people don't realize: that string doesn't have to end up at Anthropic.&lt;/p&gt;

&lt;p&gt;It doesn't even have to end up at a Claude model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Name Is a Routing Hint, Not a Destination
&lt;/h2&gt;

&lt;p&gt;When Claude Code makes a request, it sends something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"stream"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; points to a local proxy instead of &lt;code&gt;api.anthropic.com&lt;/code&gt;, that proxy receives the request first. It can read the model field, and decide what to do with it.&lt;/p&gt;

&lt;p&gt;That decision is entirely up to you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CliGate Does With It
&lt;/h2&gt;

&lt;p&gt;CliGate is a local proxy that sits at &lt;code&gt;localhost:8081&lt;/code&gt;. Every AI coding tool I use — Claude Code, Codex CLI, Gemini CLI — routes through it.&lt;/p&gt;

&lt;p&gt;When a request for &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; arrives, CliGate checks its routing table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-sonnet-4-6  →  ChatGPT account pool  →  GPT-5.2 Codex
claude-opus-4-6    →  ChatGPT account pool  →  GPT-5.3 Codex
claude-haiku-4-5   →  Kilo AI (free)        →  DeepSeek R1 / Qwen3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code asked for &lt;code&gt;claude-sonnet-4-6&lt;/code&gt;. What actually handles the request is GPT-5.2 Codex, via a rotating pool of ChatGPT accounts. The response comes back in Anthropic's response format. Claude Code never knows the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;The magic is in protocol translation. CliGate translates between Anthropic's Messages API format and OpenAI's Chat Completions format at the proxy layer. Claude Code speaks Anthropic protocol. GPT-5.2 Codex speaks OpenAI protocol. The proxy bridges them invisibly.&lt;/p&gt;

&lt;p&gt;From Claude Code's perspective, it sent a request and got back a valid streaming Anthropic response. The model name in the response is echoed back correctly. Everything behaves as expected.&lt;/p&gt;

&lt;p&gt;The same logic applies to the haiku model. When Claude Code sends a quick completion request using &lt;code&gt;claude-haiku-4-5&lt;/code&gt;, that gets routed to DeepSeek R1 or Qwen3 through Kilo AI — completely free, no API key required. Claude Code sees a streaming Anthropic response and moves on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting This Up
&lt;/h2&gt;

&lt;p&gt;The routing table lives in CliGate's Settings tab. Each model can be mapped to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A specific ChatGPT account (or the account pool, for automatic rotation)&lt;/li&gt;
&lt;li&gt;A Claude account (direct Anthropic protocol, no translation needed)&lt;/li&gt;
&lt;li&gt;An API key (OpenAI, Anthropic, Azure, Vertex AI, Gemini, etc.)&lt;/li&gt;
&lt;li&gt;The free routing path via Kilo AI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also set a &lt;strong&gt;Priority Mode&lt;/strong&gt; for each model: account pool first (free tier), or API key first (more reliable). If the first option fails or is exhausted, the proxy falls back to the next one automatically.&lt;/p&gt;

&lt;p&gt;One practical configuration I've settled on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s"&gt;ChatGPT account pool  (4 accounts, round-robin)&lt;/span&gt;
&lt;span class="na"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;Anthropic API key     (reserved for long context work)&lt;/span&gt;
&lt;span class="na"&gt;claude-haiku-4-5&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;Free routing          (DeepSeek R1 via Kilo AI)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the vast majority of my coding requests go through the ChatGPT account pool at no API cost. The Anthropic key only gets touched for heavy reasoning tasks. Haiku requests are free.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;I expected some quality degradation when routing sonnet requests to GPT-5.2 Codex. For most coding tasks, I didn't notice any.&lt;/p&gt;

&lt;p&gt;Code generation, test writing, refactoring, explaining stack traces — these all behaved identically from Claude Code's interface. The model was different. The output quality was comparable. The cost was zero (account pool, no API billing).&lt;/p&gt;

&lt;p&gt;The cases where I do notice a difference are long multi-file reasoning tasks, where I've configured the fallback to use the Anthropic API key directly. But those are a small fraction of the total request volume, as the usage stats from yesterday confirmed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Beyond Cost
&lt;/h2&gt;

&lt;p&gt;The cost savings are real, but that's not the most interesting part.&lt;/p&gt;

&lt;p&gt;The more interesting implication is that your AI coding tool no longer locks you into a single provider's ecosystem. You chose Claude Code for its UX and agent loop — not necessarily because Anthropic's API is the only place you want your requests going. &lt;/p&gt;

&lt;p&gt;With a proxy routing layer, those are two separate decisions. You can use the tool you like with the backend that makes sense for each request type.&lt;/p&gt;

&lt;p&gt;The model name in your config is just a string. Where it goes is up to the routing layer.&lt;/p&gt;




&lt;p&gt;CliGate is open source: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious what routing setups others have tried — are you using a single provider for everything, or have you experimented with mixing backends?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"My AI Coding Tools Were Running Up a Tab I Couldn't See — So I Fixed That"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Mon, 13 Apr 2026 03:16:31 +0000</pubDate>
      <link>https://dev.to/codekingai/my-ai-coding-tools-were-running-up-a-tab-i-couldnt-see-so-i-fixed-that-1g67</link>
      <guid>https://dev.to/codekingai/my-ai-coding-tools-were-running-up-a-tab-i-couldnt-see-so-i-fixed-that-1g67</guid>
      <description>&lt;p&gt;Three months ago I had four AI coding tools set up: Claude Code, Codex CLI, Gemini CLI, and a chat UI for quick questions. Every month I'd get a bill from Anthropic and a bill from OpenAI and vaguely wonder what I'd actually spent them on.&lt;/p&gt;

&lt;p&gt;I had no idea which model was being called when. I didn't know if Claude Code was routing to Sonnet or Opus. I didn't know how many tokens Gemini was burning in the background. I just paid the bill and moved on.&lt;/p&gt;

&lt;p&gt;Then I looked at one month's invoice line by line.&lt;/p&gt;

&lt;p&gt;The answer was uncomfortable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem With Opaque AI Billing
&lt;/h2&gt;

&lt;p&gt;When you use AI coding tools directly, the billing is aggregated. You see "claude-sonnet-4-6: 2.4M tokens" but you don't know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tasks generated those tokens (code review? refactors? quick completions?)&lt;/li&gt;
&lt;li&gt;Which tool was responsible (Claude Code? your chat UI?)&lt;/li&gt;
&lt;li&gt;Whether any of it could have been handled by a cheaper — or free — model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You're essentially flying blind. You optimize what you can measure, and the billing dashboards the providers give you aren't built for developers trying to understand usage at the tool level.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Did About It
&lt;/h2&gt;

&lt;p&gt;CliGate is a local proxy I built that sits between your AI coding tools and the upstream APIs. All four tools route through it — one &lt;code&gt;localhost:8081&lt;/code&gt;, one place to manage credentials and routing.&lt;/p&gt;

&lt;p&gt;That position in the stack turned out to be the perfect place to add cost tracking.&lt;/p&gt;

&lt;p&gt;Every request passes through the proxy. The proxy knows: which tool sent it, which model was requested, how many tokens were used (from the response stream), and what each model costs per token. The math is simple. The data is suddenly very visible.&lt;/p&gt;

&lt;p&gt;Here's what the usage dashboard looks like after a week of normal coding work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Provider breakdown (this week)
──────────────────────────────────────────
Anthropic API          $4.82   68%
ChatGPT Account         $0.00    0%   ← account pool, no API cost
Free (Kilo AI)          $0.00    0%   ← routed to DeepSeek/Qwen
OpenAI API              $2.27   32%
──────────────────────────────────────────
Total                   $7.09
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model breakdown told an even more interesting story:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-sonnet-4-6       $4.21   59%
claude-haiku-4-5        $0.00    0%   ← free routing active
gpt-4o                  $1.89   27%
codex-mini              $0.38    5%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The haiku line at zero was the thing that made me stop and think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bit I Didn't Expect: Some Models Are Just Free
&lt;/h2&gt;

&lt;p&gt;CliGate has a feature called free model routing. When a request comes in for &lt;code&gt;claude-haiku-4-5&lt;/code&gt;, instead of forwarding it to Anthropic, the proxy routes it to a free model — DeepSeek R1, Qwen3, MiniMax, whatever you've configured — via Kilo AI. No API key needed.&lt;/p&gt;

&lt;p&gt;I turned this on almost as an experiment. But looking at the usage stats a week later: every quick question, every short completion, every "what does this function do" — all of that had been handled for free. The expensive Sonnet calls were left for the work that actually needed it.&lt;/p&gt;

&lt;p&gt;That split happened automatically. I didn't have to think about it.&lt;/p&gt;

&lt;p&gt;You can change which free model handles haiku requests from the Settings tab. I've been rotating between DeepSeek R1 and Qwen3 depending on the task type — DeepSeek for reasoning-heavy work, Qwen3 for code generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Details That Actually Changed My Behavior
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Per-account tracking.&lt;/strong&gt; I have multiple Claude accounts in the pool. The usage stats break down by account, so I can see if one account is hitting its quota faster than others and rebalance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Daily and monthly views.&lt;/strong&gt; You can toggle between a daily sparkline and a monthly total. The daily view is where you catch the outliers — that one afternoon you had three long Claude Code sessions refactoring a module shows up as a spike and explains why a particular week cost more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing registry.&lt;/strong&gt; Every model's per-token price is configurable. When OpenAI changes pricing (which happens), you can update it in the dashboard without touching any config files. You can also add manual overrides for models that aren't in the default list.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost per request in the logs.&lt;/strong&gt; The request log view shows cost alongside each request. If something seems expensive, you can pull up the exact prompt, response, token count, and cost in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Changed Practically
&lt;/h2&gt;

&lt;p&gt;I now route &lt;code&gt;claude-haiku&lt;/code&gt; tasks through free models by default, and I've set up app-level routing so my quick chat window (the thing I use for "hey what's this error") hits the free path while Claude Code gets the full Sonnet model.&lt;/p&gt;

&lt;p&gt;My monthly AI tool spend dropped roughly 40% without changing how I actually work.&lt;/p&gt;

&lt;p&gt;The bigger change is more subtle: I stopped treating AI API costs as a fixed overhead I couldn't influence. Once you can see the breakdown, you start making different decisions about which model to reach for.&lt;/p&gt;




&lt;p&gt;If you're running multiple AI coding tools and paying per-token for all of them, it's worth spending 10 minutes to actually look at where the spend goes. The answer might be more improvable than you'd expect.&lt;/p&gt;

&lt;p&gt;CliGate is free and open source: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What does your current AI tool spend look like? Are you tracking it at all, or just paying the bill?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>javascript</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>"I Pointed Claude Code at My Local Ollama Models — Here's the 3-Minute Setup"</title>
      <dc:creator>CodeKing</dc:creator>
      <pubDate>Fri, 10 Apr 2026 07:35:13 +0000</pubDate>
      <link>https://dev.to/codekingai/i-pointed-claude-code-at-my-local-ollama-models-heres-the-3-minute-setup-4hha</link>
      <guid>https://dev.to/codekingai/i-pointed-claude-code-at-my-local-ollama-models-heres-the-3-minute-setup-4hha</guid>
      <description>&lt;p&gt;My API bill last month had a line I couldn't ignore.&lt;/p&gt;

&lt;p&gt;Not the expensive reasoning tasks — those I expected. It was the small stuff. The "what does this error mean" questions. The quick refactors. The five-line test I asked Claude Code to write at 11pm. A thousand tiny requests, all billed like they mattered.&lt;/p&gt;

&lt;p&gt;Meanwhile, I had Ollama running on my machine with &lt;code&gt;qwen2.5-coder&lt;/code&gt; loaded. Fast. Free. Already sitting there.&lt;/p&gt;

&lt;p&gt;The problem was that my CLI tools had no idea it existed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wiring Problem
&lt;/h2&gt;

&lt;p&gt;Claude Code speaks Anthropic's protocol. Codex CLI speaks OpenAI's. Gemini CLI speaks Google's. And Ollama? It speaks its own thing — but it also exposes an OpenAI-compatible endpoint at &lt;code&gt;http://localhost:11434&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So the question isn't "can Ollama do this" — it clearly can. The question is: &lt;strong&gt;how do you get your tools to talk to it without rewriting your entire config every time you switch between local and cloud?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's what I spent the last week solving, and I've now shipped it as part of &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;CliGate&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;CliGate is a local proxy that already handles routing Claude Code, Codex CLI, and Gemini CLI to cloud providers. The new local model support adds Ollama as a first-class routing target alongside OpenAI, Anthropic, and Google.&lt;/p&gt;

&lt;p&gt;When local model routing is enabled, CliGate intercepts requests from your CLI tools and — depending on your config — sends them to Ollama instead of the cloud. Protocol translation happens in the proxy layer: Claude Code's Anthropic-formatted request gets adapted to whatever Ollama expects, the response gets adapted back.&lt;/p&gt;

&lt;p&gt;Your tool never knows the difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3-Minute Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Make sure Ollama is running with a model&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run qwen2.5-coder:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or any model you prefer. CliGate auto-discovers whatever's loaded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify Ollama is accessible&lt;/span&gt;
curl http://localhost:11434/api/version
&lt;span class="c"&gt;# {"version":"0.6.x"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2 — Start CliGate&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dashboard opens at &lt;code&gt;http://localhost:8081&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Add your Ollama instance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Settings → Local Models&lt;/strong&gt;. Add your Ollama URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:11434
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CliGate runs a health check and then fetches your model list via &lt;code&gt;/v1/models&lt;/code&gt;. You'll see your loaded models appear automatically — no manual entry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — Enable local routing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Toggle on &lt;strong&gt;"Local Model Routing"&lt;/strong&gt;. At this point, any request that would normally go to a cloud provider will check local models first.&lt;/p&gt;

&lt;p&gt;You can also configure this per-app. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; → &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt; (your local coding model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI&lt;/strong&gt; → cloud (when you need the full thing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI&lt;/strong&gt; → cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; juggling. No re-exporting env vars. One dashboard toggle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Test it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to the &lt;strong&gt;Chat&lt;/strong&gt; tab, pick "Local Model" as the source, and send a message. If it comes back, the routing is working. Then go to your terminal and use Claude Code normally — the proxy handles the rest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code is already pointed at CliGate from the one-click setup&lt;/span&gt;
claude &lt;span class="s2"&gt;"explain what this function does"&lt;/span&gt;
&lt;span class="c"&gt;# → routes to your local Ollama model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;I expected the basic routing to be the hard part. It wasn't.&lt;/p&gt;

&lt;p&gt;The interesting problem was &lt;strong&gt;streaming&lt;/strong&gt;. Claude Code expects streaming responses in Anthropic's SSE format. Ollama streams in its own format. Getting those two to handshake correctly without garbling the output took longer than everything else combined.&lt;/p&gt;

&lt;p&gt;The solution is a dedicated SSE bridge in the proxy layer that reads Ollama's stream chunk-by-chunk and re-emits it in the format the requesting tool expects. Claude Code sees a normal Anthropic streaming response. It never touches Ollama directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code
  └─→ POST /v1/messages (Anthropic format, streaming)
        └─→ CliGate proxy
              └─→ detects: local routing enabled
              └─→ sends to Ollama /v1/chat/completions
              └─→ re-streams response as Anthropic SSE
        ←─ Claude Code receives: normal streaming response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern for Codex CLI (OpenAI Responses format) and any other tool you route through the proxy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Is Actually Good For
&lt;/h2&gt;

&lt;p&gt;I'm not suggesting you replace GPT-4 or Claude Sonnet with a local 7B model. There's a real capability difference.&lt;/p&gt;

&lt;p&gt;But a lot of what I actually use Claude Code for in a normal day doesn't need the best model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What does this stacktrace mean?"&lt;/li&gt;
&lt;li&gt;"Generate a unit test for this function"&lt;/li&gt;
&lt;li&gt;"Rename these variables to be more descriptive"&lt;/li&gt;
&lt;li&gt;"Does this SQL query look right?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For tasks like these, &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt; is fast, accurate enough, and free. Saving the cloud calls for the harder problems — complex refactors, architecture questions, multi-file changes — drops my monthly API bill significantly without changing my workflow.&lt;/p&gt;

&lt;p&gt;The toggle in CliGate makes it easy to switch back when you need to.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Your Local Model Setup?
&lt;/h2&gt;

&lt;p&gt;Are you running Ollama (or LM Studio, or anything else) for coding tasks? I'm curious what models people are finding useful for day-to-day dev work — especially anything that runs well on a laptop.&lt;/p&gt;




&lt;p&gt;GitHub: &lt;a href="https://github.com/codeking-ai/cligate" rel="noopener noreferrer"&gt;github.com/codeking-ai/cligate&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cligate@latest start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>tutorial</category>
      <category>webdev</category>
      <category>node</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
