DEV Community: CodeKing

"My AI Assistant Could Code, But It Couldn't Operate My Desktop"

CodeKing — Tue, 26 May 2026 09:51:16 +0000

My assistant could already read files, run shell commands, and delegate coding work to Claude Code or Codex.

But the moment a workflow hit a real desktop app, the illusion broke.

A browser needed a click. A page needed a scroll. A field needed real text input. A task could finish the hard part and still get stuck on the last two seconds of UI.

That felt like a fake kind of automation.

The problem wasn't coding

The hard part here wasn't generating code. It was crossing the gap between "I know what should happen next" and "I can actually operate the window in front of me."

In practice, that gap showed up in small but annoying ways:

a browser tab needed Ctrl+L and a URL paste
a page exposed no reliable accessibility selector, so a screenshot was needed first
a long form needed scrolling inside the right pane, not the whole desktop
a final publish step still depended on one visible button

So the assistant didn't need another coding loop. It needed a safe desktop-control layer.

The local control loop I added

I added a small set of desktop tools around a companion agent running on the same machine.

The assistant can now do things like:

list windows
focus a specific app
find accessible controls when UI Automation is available
set input values directly
send hotkeys like Ctrl+L
capture screenshots before pixel-based actions
click, move, and scroll with explicit coordinates only after visual confirmation

The key constraint is simple: observe first, then act.

If selectors are available, use them. If they are not, capture the window, inspect what is actually visible, and only then click. That rule matters more than any single tool because it keeps desktop automation from turning into random coordinate guessing.

What changed in the workflow

Before this, the assistant could help me prepare a task but not finish anything that crossed into a real app.

Now the same local loop can cover more of the actual workflow:

inspect window → focus app → locate control or capture screenshot → act → verify

That sounds small, but it changes what "assistant" means in practice.

It is no longer limited to code and terminal state. It can handle the messy last mile where real work often stalls.

Why I kept it local

I did not want this running through a hosted browser service or a remote desktop relay.

Desktop control touches exactly the kind of things that should stay on the machine that owns them: open apps, visible windows, clipboard state, local sessions, and personal accounts.

Keeping it local also makes the loop faster. The assistant can inspect, act, and verify against the current desktop state without shipping screenshots or UI events to another service first.

That local-first constraint fits the rest of CliGate anyway. The gateway, the assistant, the runtimes, and now the desktop-control layer all live on the same box.

What I learned

The interesting lesson was that "assistant capability" is not just about better reasoning or better code generation.

A lot of workflows fail because the assistant cannot cross boundaries between tools.

Terminal-only automation is useful. But if the real workflow ends in a browser, settings window, login dialog, or web app form, then desktop control becomes part of the product surface whether you planned for it or not.

So this update was less about making the assistant smarter and more about making it less incomplete.

If you're building local AI tooling, where does your automation still stop — at the terminal, at the API, or at the desktop?

Repo: https://github.com/codeking-ai/cligate

"My AI Assistant Could Code, But It Couldn't Operate My Desktop"

CodeKing — Tue, 26 May 2026 09:10:44 +0000

Most AI coding agents are good until the task leaves the terminal.

They can edit files. They can run tests. They can explain a diff. Then the work hits a desktop app, an OAuth approval screen, a native settings window, or a web UI that was not designed for API access. Suddenly the agent is not stuck on intelligence. It is stuck on reach.

That was the gap I kept running into while building my local AI setup. I had Claude Code, Codex CLI, Gemini CLI, local models, provider keys, and account pools. The missing piece was not another model.

It was an operator.

The Problem Was The Boundary

My old workflow had two separate worlds.

In one world, coding agents lived inside terminals and repos. They could reason about code, run commands, and keep a session alive.

In the other world, real work still happened through desktop apps, dashboards, browser windows, chat clients, and provider consoles. A human could jump between those worlds without thinking. An agent could not.

That made the assistant feel smaller than it should:

It could fix a bug, but not always finish the setup.
It could tell me where to click, but not click safely.
It could generate a workflow, but not reliably drive the app that owned the workflow.
It could reuse project knowledge, but only if I remembered to paste it in.

So I changed how I think about CliGate.

CliGate is no longer just a local API gateway for AI tools. It is becoming a local control plane for agent work.

What CliGate Does Now

CliGate still starts as one localhost service for AI coding tools.

You can point Claude Code, Codex CLI, Gemini CLI, and OpenClaw at the same local server, then manage provider keys, account pools, routing, usage, logs, and local runtimes from one dashboard.

But the newer assistant layer sits above that.

It has two modes:

Direct runtime: keep talking to the current Codex or Claude Code session.
Assistant collaboration: ask CliGate Assistant to inspect state, choose a runtime, continue a task, handle a blocked run, or summarize what happened.

That split matters. I do not want every normal message to be intercepted by a clever supervisor. Sometimes I just want to continue the current runtime session. Other times I want an assistant that can see the bigger picture.

The assistant is not trying to replace Codex or Claude Code. It coordinates them.

Skills Made It Less Generic

The second piece is skills.

A skill is a local package of instructions, scripts, templates, and references. The assistant does not need every detail in context all the time. It can see a short description first, then read the full SKILL.md only when the task matches.

For example:

skills/
  devto-publisher/
    SKILL.md
    publish.js
    templates/

That turns the assistant from "a general chat box with tools" into something closer to a teammate with reusable procedures.

One skill can know how to publish a Dev.to article. Another can know how to build a spreadsheet. Another can know the conventions of a local repo. The key is that these are local, inspectable, and executable through the same permission system as the rest of the agent.

It is not magic. It is just a better way to keep operational knowledge out of one giant prompt.

The Desktop Part Is The Big Unlock

The part I am most excited about is desktop control.

The first naive version of desktop automation is usually visual: take a screenshot, ask the model where to click, move the mouse, repeat. That works for demos, but it is fragile. Small buttons, focus changes, DPI scaling, popups, and animations can break it.

CliGate's desktop agent takes a different default path on Windows: UI Automation first, screenshots second.

Instead of guessing pixels, the assistant can ask the operating system for the UI tree:

list windows -> focus app -> find input -> set value -> send Enter -> read text

That means it can find a textbox by control type, set its value through the accessibility API, invoke a button, read visible text, and only fall back to screenshots when the app does not expose useful accessibility metadata.

This is the bridge I wanted: a coding assistant that can work in repos, but also operate the desktop applications that surround the repo.

Where This Is Going

The current shape is:

CliGate routes AI coding tools through one local server.
Runtime sessions keep Codex and Claude Code work alive.
The assistant watches, coordinates, and summarizes.
Skills give it reusable procedures.
Desktop control gives it a path into native apps and GUI workflows.

That combination changes the product from "proxy for AI tools" into "local operator for developer workflows."

I think the desktop-control layer deserves its own post, because "AI can operate any app through the OS accessibility tree" is a deeper topic than I can fit here.

The project is open source here: CliGate on GitHub

How are you handling the boundary between coding agents and the desktop apps they still need to interact with?

DeepSeek's API Price Cut Changed My Claude Code and ChatGPT Math

CodeKing — Mon, 25 May 2026 08:37:24 +0000

The DeepSeek API price cut made me rethink a habit I had quietly accepted: choosing an AI coding tool and then living with whatever model economics came with it.

Claude Code is great when I want a strong terminal-native coding agent. ChatGPT and Codex are great when I want OpenAI's workflow and model stack. But when a provider like DeepSeek suddenly drops API pricing, the obvious question is not just "is this cheap?"

It is: can I actually use the cheaper model from the tools I already use?

The Price Cut Is The Interesting Part

As of May 25, 2026, DeepSeek's pricing page lists V4 Flash at:

$0.14 per 1M input tokens
$0.0028 per 1M cached input tokens
$0.28 per 1M output tokens

It also lists V4 Pro at the 75% discounted rate, with a note that after the promotion ends on May 31, 2026, the API price will still be officially adjusted to one-quarter of the original price:

$0.435 per 1M input tokens
$0.003625 per 1M cached input tokens
$0.87 per 1M output tokens

The part that matters for coding agents is cached input. Coding tools resend a lot of repeated context: system prompts, repo summaries, conversation history, tool schemas, and task state. If cache hits are cheap enough, repeated agent loops start looking very different economically.

I checked the current public pricing pages before writing this: DeepSeek API pricing, Claude plans, Claude API models, ChatGPT plans, and OpenAI API pricing.

That is why this cut is more than a nice model announcement. It changes where I want routine coding traffic to go.

The Comparison I Actually Care About

Claude Code pricing is predictable if you use a subscription: Claude Pro is $20/month when billed monthly, and Max starts at $100/month. On the API side, Anthropic lists Claude Opus 4.7 at $5 input and $25 output per 1M tokens, and Sonnet 4.6 at $3 input and $15 output.

ChatGPT has the same split. Plus is the familiar $20/month plan, Pro tiers go much higher, and OpenAI API pricing for flagship GPT models is still priced like premium infrastructure. GPT-5.5 is listed at $5 input, $0.50 cached input, and $30 output per 1M tokens.

Those plans can be worth it. I am not pretending DeepSeek replaces every hard reasoning workload.

But for coding-agent traffic, the uncomfortable truth is that a lot of tokens are not "hard reasoning" tokens. They are:

reading files
rewriting boilerplate
producing test scaffolds
formatting docs
classifying intent
continuing a known task

That is exactly the kind of traffic I want to route to a cheaper model first.

The Annoying Part: Tools Do Not Make This Easy

The problem is that Claude Code, Codex, and ChatGPT-style workflows do not all speak the same protocol.

Claude Code expects Anthropic-shaped requests.

Codex expects OpenAI-shaped requests.

Other tools may expect Gemini-style routes or their own local configuration. So even when DeepSeek exposes low-cost models, the practical setup can still turn into a mess of environment variables, API keys, base URLs, and wrappers.

That is the gap I built CliGate to fill.

What Changed With CliGate

CliGate is a local AI gateway that runs on localhost. Instead of pointing every tool directly at a provider, I point the tools at CliGate once:

# Claude Code
export ANTHROPIC_BASE_URL=http://localhost:8081
export ANTHROPIC_API_KEY=any-key

Codex can also point at the same local gateway through its OpenAI-compatible configuration.

From there, CliGate handles the important layer:

route Claude Code, Codex CLI, Gemini CLI, and web chat through one local control plane
keep account pools and API keys in the same routing layer
map model names and app-level routes
send routine traffic to DeepSeek when cost matters
keep premium models available for the tasks that actually need them
show usage, request logs, and cost views in the dashboard

That means I do not have to decide "Claude Code or DeepSeek" as a tool choice. I can keep Claude Code as the interface and route some of its traffic through DeepSeek. I can keep Codex as the workflow and still move compatible requests to a cheaper upstream.

The Real Advantage Is Not Just Cheap Tokens

Cheap tokens help. But the bigger advantage is optionality.

I want to be able to say:

use DeepSeek V4 Flash for cheap routine work
use DeepSeek V4 Pro when I want stronger low-cost reasoning
keep Claude for difficult multi-file edits
keep GPT for workflows where OpenAI's stack is the right fit
keep local models for private or offline tasks

Without a routing layer, that sounds like a spreadsheet and a pile of config files. With a local gateway, it becomes an operations problem: add keys, set routing, inspect usage, adjust when the bill or quality tells you to.

That is the product advantage I care about. CliGate does not ask me to abandon Claude Code or ChatGPT-style tools. It lets those tools reach low-cost DeepSeek models without changing how I work.

My New Default

After this price cut, my default is no longer "pick one premium coding assistant and pay whatever it costs."

It is:

keep the coding tools I like
route routine traffic to the cheapest good-enough model
reserve expensive models for the tasks that justify them
watch usage and pricing in one place

That feels like the right shape for AI coding in 2026.

The models will keep changing. The prices will definitely keep changing. The part I do not want to keep changing is every CLI config on my machine.

CliGate is here if you want to inspect the implementation: https://github.com/codeking-ai/cligate

How are you handling model cost now: one subscription, direct API usage, or routing per task?

"My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline"

CodeKing — Fri, 22 May 2026 11:35:09 +0000

I thought my web chat was the simplest surface in the whole product.

Telegram, Feishu, and DingTalk were the complicated ones. Web chat was just the dashboard. Same browser, same server, same app. What could possibly go wrong?

A lot, apparently.

The bug looked random from the UI

A task would start from the web chat UI just fine.

The runtime session existed. The conversation existed. The task existed. The logs looked healthy enough.

And then the delivery pipeline tried to send a follow-up update back into the conversation and got:

conversation_not_found

Which made no sense, because the conversation definitely existed. I had just used it.

This is the kind of bug that wastes time because every individual subsystem looks half-correct.

The real problem: I treated web chat like a page, not a channel

The architecture in CliGate already had a channel model.

Telegram is a channel. Feishu is a channel. DingTalk is a channel.

Those inbound messages go through the same supervision and delivery machinery:

conversation store
scheduler
delivery sender
assistant orchestration
runtime session binding

But web chat had slowly drifted into a special-case path.

That felt harmless at first. Web chat lived inside the same app, so it was easy to give it a little custom state and a few convenience wrappers.

That was the mistake.

What actually broke

The old version of chat-ui/conversation-store.js exported its own store instance.

Meanwhile, the delivery and orchestration path used the shared channel conversation store.

So both sides were reading and writing "conversations," but not the same in-memory array.

That meant:

the chat UI could create a conversation
the route handler could see it
the runtime could bind to it
but the scheduler could still fail to find it later

The comments in the fix say it more plainly than I can:

// chat-ui and agent-channels each held a SEPARATE in-memory
// `conversations` array, even though both wrote to the same JSON file on disk.
// After server start, a chat-ui conversation created at runtime was visible to
// chat-ui-route but NOT to message-service, so scheduler deliveries hit
// `conversation_not_found` and silently dropped notifications.

That is exactly the sort of bug you get when a "small UI shortcut" quietly forks your domain model.

The fix was not complicated

I did not need a new abstraction.

I needed one source of truth.

Instead of exporting a dedicated chat-ui conversation store in production, I attached chat-specific helpers to the shared singleton used by the channel system:

installChatUiHelpers(agentChannelConversationStore);

export const chatUiConversationStore = agentChannelConversationStore;

That one change matters more than it looks.

Now web chat is not pretending to be adjacent to the channel system. It is part of the channel system.

Why this changed more than delivery

Once I stopped treating web chat as a special page, a lot of other decisions became cleaner.

A chat-ui conversation now behaves like a real peer of the other channels:

it has the same conversation identity model
it uses the same assistant delivery state
it flows through the same runtime binding logic
it can receive scheduler-driven updates without weird bridging code

That matters because a multi-surface assistant only stays sane if all entry points agree on what a conversation is.

If one surface has its own special rules, you do not have one product anymore. You have one product plus one exception that keeps leaking.

The other important fix: seed assistant mode from the start

There was a second detail hidden in the same file.

New chat-ui conversations now start with assistant control mode already set:

assistantCore: buildAssistantCoreDeliveryState(
  existingAssistantCore,
  { controlMode: CONVERSATION_ASSISTANT_CONTROL_MODE.ASSISTANT }
)

That matters because web chat should enter the same top-level assistant orchestration path as the messaging channels.

If the first message from the web UI skips that and goes straight to the bound runtime, you get behavioral drift:

web chat behaves one way
Telegram behaves another way
Feishu behaves another way

Then every bug becomes impossible to reason about because the surfaces are no longer comparable.

The test I actually wanted

I have learned to distrust fixes like this unless there is a test that proves the behavioral contract.

The right question was not "does chat-ui store still work?"

The right question was:

does a chat-ui conversation participate in assistant behavior like a real channel?

That is why the surrounding tests focus on assistant-mode behavior and persisted conversation semantics instead of only checking helper methods in isolation.

The implementation detail was a store instance bug.

The product bug was channel inconsistency.

What I learned from this

When you build multi-channel agent systems, the browser UI is seductive.

It feels local. It feels simple. It feels close enough to the app that you can justify giving it custom flow control, custom state, or custom routing.

That instinct is expensive.

If the browser chat can start tasks, receive async updates, carry conversation identity, and interact with the same supervisor as your mobile or messaging surfaces, then it is not "just a page."

It is a channel.

And if you do not model it that way, the architecture will eventually make you pay for the lie.

Are you treating your web chat as a first-class channel, or as a special case that has not failed loudly yet?

Repo: github.com/codeking-ai/cligate

"My Coding Agent Remembered Sessions, Not Work. That Was the Bug"

CodeKing — Thu, 21 May 2026 03:46:07 +0000

The first version of my coding agent had a very common bug: it remembered the conversation, but not the work.

That sounds fine until the agent has to do something real.

I would start a task from the web UI, continue it from a mobile channel, approve one command, ask for progress later, and then discover that the system was mostly guessing from the last few messages. It knew there was a session. It did not really know what job that session belonged to.

That is the difference between a chatbot and a working assistant.

The Problem Was The Unit Of Memory

Most agent systems begin with a simple shape:

conversation -> runtime session -> messages

That works for demos because the user does one thing at a time.

It breaks when the user behaves normally:

"continue the routing task"
"use Claude Code to review what Codex just changed"
"what happened with the thing from yesterday?"
"retry that, but keep the same working directory"

None of those are really about a chat session. They are about work.

A runtime session can crash. A user can switch from web to Telegram or Feishu. Two agents can work on the same issue from different roles. If the system treats the runtime session as the main identity, every one of those cases becomes fragile.

The Fix: Split Work From Execution

In CliGate, I started moving the design toward a different model:

Person
  -> Project
    -> Task
      -> Execution
        -> RuntimeSession

The important part is not the diagram. It is the boundary.

A Task is the thing the user thinks they are doing: "fix routing", "review the auth change", "write release notes", "check why the build failed".

An Execution is one concrete attempt to move that task forward. It may be Codex acting as the editor, Claude Code acting as a reviewer, or another provider doing a focused job.

A RuntimeSession is just the current process or provider session underneath that execution.

That means the assistant can say: this is still the same task, even if the runtime process has changed.

Why This Matters In Real Use

The most annoying bugs came from follow-ups.

When I typed:

make the button green

I did not mean "start an unrelated new job." I meant "continue the last task with the same context."

When I typed:

use cc to review it too

I did not mean "replace the current agent." I meant "spawn a second execution under the same task, with a reviewer role."

Those two messages look similar if all you have is chat history. They are very different if the system has a task model.

Once the assistant can distinguish task identity from execution identity, a few things become much easier:

status questions can be answered from task state
provider preference can follow the work instead of the channel
a dead runtime can be replaced without pretending the task is new
multiple agents can collaborate without sharing one messy transcript
web UI and mobile channels can show different levels of detail

That last point surprised me. On mobile, I want a short answer: "Codex is waiting for approval." In the web UI, I may want the full timeline: user message, assistant decision, runtime event, command output, file changes, approval, result.

Same task. Different presentation.

The Rule I Wish I Had Started With

If the user can reasonably ask "what happened with that thing?", that thing deserves an identity outside the chat transcript.

For my project, that identity became Task.

The runtime session is still useful. It preserves provider context and lets the agent resume efficiently. But it should not be the thing the product uses to understand the user's work.

Sessions are implementation details. Work is the product surface.

What Changed

I am still iterating on the architecture, but the direction already cleaned up several design decisions:

follow-ups route to tasks, not just the latest session
retries can keep the same task identity
reviewer agents can attach to the same task as editor agents
approvals can be remembered at task or project scope
channel messages can stay short without losing full traceability in the dashboard

This also made failure handling less awkward. If a runtime dies, the assistant does not need to tell the user "your session is gone, please start over." It can start a new runtime under the same execution or create a fresh execution under the same task, depending on what actually failed.

That is a small implementation detail with a large UX effect.

The Takeaway

I used to think agent memory meant better summaries of previous messages.

Now I think the more important question is: what are you summarizing into?

If everything collapses back into a conversation, the assistant will eventually lose the shape of the work. If the product has explicit projects, tasks, executions, and runtime sessions, the agent has somewhere stable to put its memory.

That has become one of the design principles behind CliGate.

If you are building coding agents, how are you modeling the difference between a conversation, a task, and a runtime session?

"My DingTalk Coding Bot Said It Started the Task. Then It Never Sent the Result"

CodeKing — Thu, 21 May 2026 03:28:09 +0000

The most annoying mobile-agent failure is not a crash.

It is the fake success message.

You send a task from DingTalk. The bot replies:

Task accepted.

Then Claude Code or Codex actually runs for a while, finishes the work, and nothing comes back to the phone.

That is worse than an immediate error. It makes you think the agent is still working, when the real problem is that the result fell out of the delivery path.

The setup

I have been building CliGate, a local AI gateway for Claude Code, Codex CLI, Gemini CLI, dashboard chat, and mobile channels.

The mobile-channel idea is simple:

send a task from DingTalk
route it to Claude Code or Codex on my machine
keep the runtime session attached to that DingTalk conversation
send approvals, questions, progress, and final results back to the same chat

The first part worked.

DingTalk could trigger the runtime.

The broken part was the final callback.

The bug: I was replying to the assistant run, not the runtime

The channel layer used to behave too much like this:

inbound message
  -> assistant run
  -> immediate assistant reply
  -> send message back to DingTalk

That sounds fine until the assistant delegates to a long-running runtime.

In that case, the useful result is not the immediate assistant text. The useful result is the runtime terminal event:

runtime completed
runtime failed
runtime asks a question
runtime asks for approval

My old logic only waited for the final runtime result in a narrow multi-session case. If one assistant run produced multiple runtime sessions, it would fan in and wait. But the common path is just one delegated runtime:

/cligate ask Claude Code to fix this bug

That produced one runtime session, so the channel often got the "started" message and missed the real result.

The fix was small but important:

function shouldDeferBackgroundCallback(result = null) {
  const sessionIds = Array.isArray(result?.assistantRun?.relatedRuntimeSessionIds)
    ? result.assistantRun.relatedRuntimeSessionIds.filter(Boolean)
    : [];
  return result?.assistantRun?.status === ASSISTANT_RUN_STATUS.WAITING_RUNTIME
    && sessionIds.length > 0;
}

The old mental model was:

only wait when there are multiple runtime sessions

The correct model is:

if the assistant delegated to any runtime, wait for that runtime result before treating the channel reply as complete

That change made single-session mobile tasks behave like real tasks instead of fire-and-forget acknowledgements.

The second bug: DingTalk's session webhook can lie by omission

DingTalk gives you a sessionWebhook for replying inside the inbound interaction window.

So the obvious implementation is:

if sessionWebhook exists and has not expired:
  send through sessionWebhook
else:
  send through App API

That is what I started with.

The problem is that the timestamp is not the whole truth. A session webhook can still look fresh locally while DingTalk rejects it server-side because the session was consumed or closed.

So this code was too optimistic:

if (sessionWebhook && (!expiredAt || expiredAt > now + 15_000)) {
  for (const chunk of textChunks) {
    result = await this.sendViaSessionWebhook(sessionWebhook, chunk);
  }
  return result;
}

If that send failed, the whole delivery failed.

The fix was to treat session webhook as the cheap first attempt, not the only attempt:

if (sessionWebhook && (!expiredAt || expiredAt > now + 15_000)) {
  try {
    for (const chunk of textChunks) {
      result = await this.sendViaSessionWebhook(sessionWebhook, chunk);
    }
    return result;
  } catch (err) {
    // fall through to App API
  }
}

Then the provider falls back to the DingTalk App API:

for (const chunk of textChunks) {
  result = await this.sendViaAppApi({
    conversationId: conversation?.externalConversationId,
    text: chunk,
    robotCode: channelContext.robotCode || '',
    conversationType: channelContext.conversationType || '',
    senderStaffId: channelContext.senderStaffId || ''
  });
}

That made delivery much more reliable.

The important lesson: a webhook expiry timestamp is not a delivery guarantee.

The third bug was hidden in the registry

This one was more subtle.

CliGate supports channel provider instances. The raw provider template in the registry is not the same thing as a started provider instance.

The started instance has settings:

clientId
clientSecret
robotCode
mode
runtime defaults

The raw template does not.

That matters because DingTalk App API fallback needs credentials:

const clientId = chooseSetting(this.settings, 'clientId', 'appKey');
const clientSecret = chooseSetting(this.settings, 'clientSecret', 'appSecret');

If the outbound delivery sender asks the raw registry for dingtalk, it may get a provider object with no settings. Then the session webhook fails, the App API fallback starts, and the fallback has no credentials.

So the channel manager now injects an instance-aware registry shim into both the dispatcher and the delivery sender:

const instanceAwareRegistry = {
  get: (providerId, instanceId) => this.getInstance(providerId, instanceId)
};

this.outboundDispatcher.registry = instanceAwareRegistry;
this.outboundDispatcher.deliverySender?.setRegistry?.(instanceAwareRegistry);

The second line is the one that matters.

It is easy to update the dispatcher and forget that the actual send path lives one object deeper.

Runtime events now drive outbound delivery

The architecture I trust more is event-based:

runtime event
  -> find channel conversations tracking that runtime session
  -> format event for the channel
  -> arbitrate whether to send now or suppress
  -> send through provider instance
  -> record delivery

The dispatcher listens to runtime session events:

this.unsubscribe = this.runtimeSessionManager.eventBus.subscribeAll((event) => {
  this.handleRuntimeEvent(event).catch(() => {});
});

Then it finds conversations tracking that runtime:

const conversations = this.conversationStore.listByTrackedRuntimeSessionId(event.sessionId);

And sends through the delivery sender:

await this.deliverySender.send({
  conversation: latestConversation,
  channel: latestConversation.channel,
  sessionId: event.sessionId,
  eventSeq: event.seq,
  message: {
    text: formatted.fullText || formatted.text || '',
    buttons: formatted.buttons || [],
    session,
    event
  }
});

That is the boundary I wanted.

The assistant may start the work, but the runtime event owns the runtime result.

I added tests for the boring parts

The boring parts are where channel bugs usually hide.

There is a test for DingTalk falling back to the App API when the session webhook is unavailable:

assert.match(String(calls[0].url), /oauth2\/accessToken/);
assert.match(String(calls[1].url), /robot\/oToMessages\/batchSend/);
assert.deepEqual(calls[1].body.userIds, ['staff_123']);
assert.equal(calls[1].body.robotCode, 'robot_123');

There is also coverage for group conversation fallback:

assert.match(String(calls[1].url), /robot\/groupMessages\/send/);

And the delivery sender records sent and suppressed deliveries into the assistant event ledger, so debugging does not depend on guessing whether the provider was called.

That is what I want for mobile agents: not just "send a message", but an auditable delivery path.

The workflow after the fix

The flow I wanted now looks like this:

DingTalk message comes in.
CliGate routes it to the assistant or direct runtime path.
Claude Code or Codex starts a runtime session.
The DingTalk thread tracks that runtime session.
Runtime terminal events trigger outbound delivery.
DingTalk session webhook is tried first when useful.
If that fails, App API fallback sends the result.

The user sees the thing that matters:

Claude Code: fixed the failing test and updated the route handler.

not just:

Task accepted.

What I learned

Mobile coding agents need stronger delivery semantics than chat demos.

It is not enough to prove that the bot can receive a message. It has to survive the whole lifecycle:

accepted
started
waiting for approval
waiting for user input
completed
failed
delivered
suppressed with a reason

And if the channel has multiple send paths, the code has to treat the first path as an optimization, not the truth.

For DingTalk, that meant:

do not trust sessionWebhook freshness too much
fall back to App API when webhook send fails
make sure the sender uses the started provider instance, not the raw provider template
wait for runtime results even when there is only one runtime session

That is not the flashy part of building an AI coding agent.

But it is the part that decides whether you can actually trust it from your phone.

If you want to inspect the implementation, the project is here:

CliGate on GitHub

I am curious how other people are handling mobile agent delivery. Do you send one "task accepted" message, or do you wire final runtime events back into the original chat thread?

"I Stopped Choosing Between Claude Code and Codex. I Put Both in One Chat Window"

CodeKing — Wed, 20 May 2026 06:19:34 +0000

Every "Claude Code vs Codex" comparison eventually runs into the same boring truth:

I do not want to pick one forever.

Some tasks feel better in Claude Code. Some feel better in Codex. Some days one account is rate-limited, one model is cheaper, or one runtime is already holding the context I need.

The annoying part is not choosing the better agent.

The annoying part is switching surfaces every time I change my mind.

The workflow I wanted

I wanted one local chat window where I could do this:

Use Codex for this task.
Continue that same runtime.
Switch to Claude Code for the next one.
Ask the assistant to plan first.
Go back to direct runtime mode.

That sounds like a UI problem, but it is really a control problem.

There are two different things happening:

direct runtime work, where the next message should go straight to Claude Code or Codex
assistant-mediated work, where a supervisor decides whether to answer, ask a question, or delegate to a runtime

If those two modes are not explicit, the chat window turns into a trap. A short follow-up like:

make it smaller

can either mean:

continue the active Codex runtime
ask the product assistant
start a new Claude Code task
answer a pending approval

Guessing wrong here is exactly how coding agents become frustrating.

So I made direct runtime the default

In CliGate, the chat UI conversation now defaults to direct runtime mode.

That was a deliberate choice.

Most of the time, when I am using a coding agent, I do not want an assistant to intercept every message and "think about what I meant." I want the current runtime to continue until I explicitly ask for something else.

There is a test that pins this behavior:

test('ChatUiConversationStore defaults new chat-ui conversations to direct-runtime control mode', () => {
  const conversation = conversationStore.findOrCreateBySessionId('chat-ui-default-direct-runtime-1');

  assert.equal(conversation.metadata?.assistantCore?.mode, 'direct-runtime');
  assert.equal(conversation.metadata?.assistantCore?.controlMode, 'direct-runtime');
});

That means a normal chat message does not automatically become "assistant work." It stays on the runtime path.

The two commands that made the UI usable

I ended up with a small mode switch instead of another complicated settings panel:

/cligate
/runtime

The mode parser is intentionally tiny:

const cligateMatch = trimmed.match(/^\/cligate(?:\s+(.+))?$/is);
if (cligateMatch) {
  return {
    command: 'cligate',
    args: String(cligateMatch[1] || '').trim()
  };
}

if (/^\/runtime$/i.test(trimmed)) {
  return {
    command: 'runtime',
    args: ''
  };
}

The behavior is:

/cligate enters assistant mode
/cligate <task> runs one assistant-mediated task
/runtime exits assistant mode and returns to direct runtime routing

That one escape hatch matters.

When I am done asking the assistant to plan or coordinate, I want the next message to go back to the active Claude Code or Codex session without ceremony.

The route now decides before touching the runtime

The chat route first gives the assistant mode service a chance to handle the message.

If assistant mode is not active and there is no /cligate command, it returns null, and the message goes down the normal runtime path:

const assistantResult = await this.assistantModeService.maybeHandleMessage({
  conversation,
  text,
  defaultRuntimeProvider,
  cwd,
  model,
  executionMode: assistantExecutionMode,
  onBackgroundResult
});

if (assistantResult) {
  return {
    ...assistantResult,
    previousSessionId: conversation.activeRuntimeSessionId || null,
    conversation: assistantResult.conversation || this.conversationStore.get(conversation.id)
  };
}

Only after that does the service route directly to the runtime:

const result = await this.messageService.routeUserMessage({
  message: { text },
  conversation,
  defaultRuntimeProvider,
  cwd,
  model,
  metadata: {
    assistantMode: getAssistantControlMode(conversation),
    source: {
      kind: 'chat-ui',
      sessionId: String(sessionId || ''),
      conversationId: conversation.id
    }
  }
});

That separation is the whole point.

The assistant does not get to hijack direct runtime messages just because it exists.

Why this is better than a "smart" default

I tried to make the assistant helpful.

Then I realized "helpful" is dangerous in a coding workflow.

If a runtime is waiting for input, the least surprising thing is to send input to that runtime. If a task has a pending approval, the least surprising thing is to resolve that approval. If the user explicitly types /cligate, then the assistant can step in.

The result feels less magical, but much easier to trust.

For example:

fix the failing unit test

can start a Codex runtime.

Then:

try the simpler patch

continues that runtime.

Then:

/cligate compare this failure with the last run before continuing

lets the assistant reason over the situation.

Then:

/runtime

puts the conversation back on the direct runtime path.

That is the loop I wanted.

Background runs needed their own guardrail

The other bug showed up after I made assistant runs asynchronous in the chat UI.

If an assistant-mediated task starts a runtime and returns later, the UI needs to persist the background result. But it must not append stale output from an older assistant run after the user has already started a newer one.

So the route records the pending assistant run ID:

if (result?.type === 'assistant_run_accepted' && result?.assistantRun?.id) {
  chatUiConversationStore.patch(conversation.id, {
    metadata: {
      ...(conversation.metadata || {}),
      uiChatPendingAssistantRunId: String(result.assistantRun.id || '').trim()
    }
  });
}

And the background callback refuses stale results:

if (getPendingUiAssistantRunId(conversation) !== backgroundRunId) {
  return;
}

That is not glamorous, but it prevents a very real UI bug:

start an assistant task
start another task before the first one finishes
watch the old answer appear under the new task

No thanks.

The model override bug was another small footgun

There was one more detail that mattered for a mixed Claude Code / Codex chat surface.

The normal chat UI has a local model selector. Runtime routing has its own provider semantics. If I let the local chat model override leak into runtime routing, I could accidentally send something like gpt-5.4 into a Claude Code runtime path where that was not the user's intent.

So for local chat-ui runtime messages, the route deliberately ignores the UI chat model override:

const runtimeModelOverride = isExternalConversation ? String(model || '') : '';

There is a test for that too:

assert.equal(captured[0].model, '');
assert.equal(captured[0].defaultRuntimeProvider, 'claude-code');

That tiny rule saved the UI from pretending that "selected chat model" and "runtime provider" are the same concept.

They are not.

What the setup looks like

Start CliGate:

npx cligate@latest start

Open the dashboard:

http://localhost:8081

Then use the Chat page as the control surface:

choose Codex or Claude Code as the runtime provider
send a normal task to start direct runtime work
keep sending follow-ups to continue that runtime
use /cligate when you want assistant-mediated planning or delegation
use /runtime to return to the direct runtime path

That is the workflow I wanted from the beginning.

Not "which terminal agent wins?"

More like:

"Can I keep both available without rebuilding my workflow around either one?"

The lesson

The current wave of AI coding tools makes comparisons tempting.

Claude Code vs Codex. Codex vs Gemini CLI. Terminal agent vs IDE agent.

Those comparisons are useful, but they miss the day-to-day problem:

developers do not just choose tools. They move between them.

For me, the useful abstraction was not a smarter chatbot.

It was a chat control surface with explicit ownership:

direct runtime by default
assistant mode only when requested
sticky runtime continuation
stale background result protection
no accidental model override leaking into runtime work

That made Claude Code and Codex feel less like competing terminals and more like two workers behind the same local desk.

If you want to inspect the implementation, the project is here:

CliGate on GitHub

I'm curious how other people are handling this. Are you choosing one coding agent, or are you building a workflow that lets several of them coexist?

"I Got Tired of Rewriting 4 AI CLI Config Files. So I Put Setup Behind One Button"

CodeKing — Mon, 18 May 2026 06:15:12 +0000

I like trying new AI coding tools.

I do not like reconfiguring them.

That was the part that kept getting old: every new CLI came with a different config file, different base URL setting, and a different way to point it at my local gateway.

After doing this a few too many times, I added a small feature to CliGate: a dashboard page that can install and configure the tools for me.

The annoying part

The tools are similar, but the setup is not:

Claude Code wants env vars
Codex wants ~/.codex/config.toml
Gemini CLI has its own proxy setup path
OpenClaw wants a JSON provider config

That means one simple goal:

"Point all of them at localhost:8081"

turns into four different setup chores.

What I changed

CliGate now has a Tools page that does two jobs:

detect whether Node.js and the CLI tools are installed
write the proxy config for each tool from the same UI

The code behind it is pretty direct.

The installer knows the official npm package for each tool:

codex: {
  name: 'Codex CLI',
  command: 'codex',
  npmPackage: '@openai/codex'
}

and the dashboard exposes actions like:

await this.api('/codex/config/proxy', { method: 'POST' });
await this.api('/api/tools/install/codex', { method: 'POST' });

So instead of editing files manually, I can open the panel, click install if a tool is missing, then click configure.

The part I wanted most

I did not want a generic "tool manager."

I wanted a very specific workflow:

install Claude Code, Codex, Gemini CLI, or OpenClaw
point each one at the same local gateway
launch the tool without leaving the dashboard

That is what the page does now.

For example, the Codex side boils down to these settings:

chatgpt_base_url = "http://localhost:8081/backend-api/"
openai_base_url = "http://localhost:8081"

Claude Code gets its localhost base URL, Gemini CLI gets patched for proxy mode, and OpenClaw gets its provider block written with the same target.

Why this is better than another README section

I already had setup docs.

The problem was not missing information. The problem was repetition.

Every time I switched machines, reset a config, or wanted to try another CLI, I was doing the same boring setup work again.

Once I moved that into the product, the project became easier to try and easier to keep using.

Who this is actually for

If you use one tool with one API key, this is probably unnecessary.

If you keep bouncing between Claude Code, Codex, Gemini CLI, or OpenClaw, the friction adds up fast. That is the use case this page fixes.

If you've built a similar setup layer for AI tooling, I'm curious what you automated first: install, auth, config, or routing.

"My Product Assistant Kept Borrowing the Wrong Model. So I Gave It Its Own Routing Chain"

CodeKing — Fri, 15 May 2026 06:25:53 +0000

I do not mind a product assistant being wrong because the docs are unclear.

I do mind it being wrong because it silently used the wrong model source.

That was the real problem I hit in my local AI gateway project, CliGate.

The assistant inside the dashboard had a clear job:

answer product-usage questions
stay grounded in the manual
avoid rewriting settings unless the user explicitly asks

But the runtime path behind that assistant was still too fuzzy. In practice, it could depend on whichever account or API key the broader system happened to resolve first.

That is fine for generic chat.

It is not fine for a product assistant that is supposed to be predictable.

The failure mode was subtle

I already had routing. I already had accounts, API keys, and model mapping. I already had a settings surface.

The annoying part was that the assistant itself still behaved too much like "just another consumer of the default pool."

That created a few bad outcomes:

the assistant could drift across providers without the user realizing it
clearing a binding could get undone by old migration behavior
one flaky credential could make the whole assistant feel unreliable
the UI could not answer a simple question like: what is the assistant actually bound to right now?

The bug was not one broken request.

The bug was that the assistant did not have a first-class routing identity.

I stopped thinking in terms of "credential" and switched to "model source"

This is the design change that made the rest of the work much easier.

I did not actually want to bind the assistant to a vague source type like "OpenAI keys" or "Claude account."

I wanted to bind it to a concrete model source:

{
  "type": "api-key",
  "id": "key_x",
  "model": "gpt-5.4"
}

That is why the new config path in CliGate moved toward boundModelSource instead of treating everything as a loose boundCredential.

The internal runtime config now normalizes around that field:

boundModelSource: stored.boundModelSource || stored.boundCredential || null,
boundCredential: stored.boundModelSource || stored.boundCredential || null,
fallbacks: Array.isArray(stored.fallbacks) ? stored.fallbacks : [],

The compatibility alias still exists, but the meaning changed. The assistant is no longer just "attached to a credential." It is attached to a specific source plus an optional model.

That sounds like a naming cleanup. It was actually a control cleanup.

I also needed a way to say "yes, the user configured this on purpose"

One of the uglier problems was legacy migration.

Older assistant settings had source toggles. Newer settings have explicit bindings. If the user cleared the binding, I did not want old migration logic to recreate it on the next restart just because a legacy flag still existed somewhere.

So I added a small but important flag:

"bindingConfigured": true

That flag means:

the user has explicitly configured assistant binding state
even if the current binding is null
do not auto-migrate old sources back into place

This was one of those changes that looks boring in a diff and saves a lot of operator confusion later.

Without it, "clear binding" is not a real action. It is just a temporary suggestion.

The assistant needed an ordered chain, not one brittle primary

Once the assistant had a proper primary binding, the next obvious problem showed up:

what happens when that source is deleted, disabled, rate-limited, or just temporarily broken?

I did not want the answer to be:

"assistant is down."

So the assistant runtime now builds a real chain:

if (config.boundModelSource || config.boundCredential) {
  chain.push(config.boundModelSource || config.boundCredential);
}
if (Array.isArray(config.fallbacks)) {
  for (const entry of config.fallbacks) {
    if (entry && typeof entry === 'object' && entry.type && entry.id) {
      chain.push(entry);
    }
  }
}

That is simple on purpose.

The first tier is the assistant's intended home. The later tiers are not magic discovery. They are explicit ordered fallbacks the user can inspect in the UI.

That matters because fallback behavior should be explainable.

If an assistant changes models under pressure, I want to know exactly why.

A circuit breaker made the assistant feel much less random

Fallback chains are not enough if you keep retrying a dead tier over and over.

So the assistant LLM client keeps breaker state per tier and skips sources that are currently in cooldown:

for (const descriptor of chain) {
  const tierKey = tierKeyFor(descriptor);
  if (this._breaker.shouldSkip(tierKey)) continue;
  const candidate = await resolveCredential(descriptor, {
    defaultChatGptModel: this.defaultChatGptModel,
    defaultClaudeModel: this.defaultClaudeModel
  });
  if (!candidate) continue;
  candidates.push({ ...candidate, tierKey });
}

And when a call fails, the tier records failure instead of pretending the error was just bad luck:

const breakerState = this._breaker.recordFailure(source.tierKey);
logger.warn(`[Supervisor] tier failed | tier=${source.tierKey} | breaker=${breakerState}`);

That changed the experience more than I expected.

Before, the assistant could feel inconsistent in a way users interpret as "the prompt changed" or "the model got weird."

After this change, the behavior became much more operational:

try the primary source
skip tripped tiers
fall through to explicit backups
expose the health state in the dashboard

That is a better failure story for a product surface.

The UI finally has something honest to show

This was another reason I wanted the routing chain to be explicit.

Once the backend exposes:

the current primary
ordered fallbacks
resolved source
breaker state
last used tier

the settings page can stop being a dead form and start being an inspection tool.

The assistant page now has controls for:

primary model source
per-tier model selection
up to three fallbacks
breaker threshold and cooldown
test-binding checks
tier health and last-used status

That is exactly the kind of visibility I wanted when debugging "why did the assistant answer from this provider instead of that one?"

I did not want the assistant to silently test with live requests

There is a small route detail here that I like because it keeps the UI honest.

The binding test endpoint validates whether a descriptor resolves, but it does not fire an actual LLM request:

const result = await describeBinding({ type: body.type, id: body.id });
return res.json({ success: result.ok, ...result });

That means the user gets a fast answer to:

"is this binding even real?"

without turning the settings screen into an accidental prompt runner.

It is a small boundary, but product assistants need that kind of boundary.

The part I trust most is the migration and route coverage

I can write all the assistant architecture docs I want, but the thing that makes me trust this change is the route-level test coverage.

For example, there are tests that pin the new primary field:

assert.deepEqual(res._body.assistantAgent.boundModelSource, {
  type: 'api-key',
  id: 'key-primary',
  model: 'gpt-5.4'
});

And tests that make sure clearing bindings is respected:

assert.equal(res._body.assistantAgent.boundModelSource, null);
assert.equal(res._body.assistantAgent.boundCredential, null);

Those are the kinds of tests that prevent a future "helpful migration" from quietly breaking the operator's intent again.

What changed in how I think about product assistants

I used to think the important part was the prompt and the docs grounding.

Those matter.

But once the assistant becomes part of the product, routing discipline matters just as much.

If the assistant is meant to be:

predictable
inspectable
recoverable
configurable without guesswork

then it cannot just borrow whatever account or API key happened to win a broader routing race.

It needs its own routing chain.

The pattern I would reuse

If you are adding a product assistant to an existing app with multiple model sources, I think this is the safer progression:

give the assistant its own explicit primary binding
bind to a concrete source plus model, not just a source type
mark explicit user configuration so legacy migration cannot override it
add ordered fallbacks
add breaker state so failures do not loop forever
expose the whole chain in the UI

That is a lot less glamorous than "ship an assistant."

But it is the difference between a demo assistant and one that operators can actually live with.

If you want to inspect the implementation, the project is here:

CliGate on GitHub

I am curious how other people are handling this. Does your product assistant have its own routing identity, or is it still borrowing the same model path as ordinary chat?

"I Stopped Letting My AI Assistant Hijack Every Message"

CodeKing — Thu, 14 May 2026 15:33:06 +0000

I kept running into the same problem while building AI tooling: the smarter the assistant looked, the less predictable the product felt.

You send a message because you want to continue the current coding session. The system decides you probably meant "start a new task," rewrites the intent, and suddenly you are no longer talking to the runtime you thought you were using.

That sounds small until you try to use it every day.

The problem was not model quality

The failure mode had very little to do with whether the underlying executor was Codex or Claude Code.

The real problem was control.

In a coding workflow, there are at least two very different intents:

I want to keep talking to the current runtime session.
I want a higher-level assistant to look at the whole situation, choose what to do, and coordinate work for me.

If those two paths share the same default entry point, the product starts guessing too much.

That guess is expensive. It changes session continuity, interrupts the mental model, and makes users wonder whether the system is actually listening or just pattern-matching.

What we changed in CliGate

CliGate is our local AI gateway for Claude Code, Codex CLI, Gemini CLI, OpenClaw, web chat, and channel-based workflows.

Instead of treating "assistant" as the universal default, we split the interaction model into two explicit modes:

Direct Runtime
Assistant Collaboration

That sounds like a UI detail, but it changed the architecture.

Direct Runtime: boring on purpose

In direct runtime mode, the rule is simple:

Your message goes to the current runtime path.

No intent interception. No surprise supervision layer. No "maybe I should help by doing something else first."

That path matters because stable tooling feels boring in the best way. If a user is already inside an active Codex or Claude Code session, the next message should continue that session unless they clearly ask for something different.

In our code, that distinction is enforced before the regular routing path kicks in:

const assistantResult = await this.assistantModeService.maybeHandleMessage({
  conversation,
  text,
  defaultRuntimeProvider,
  cwd,
  model
});

if (assistantResult) {
  return assistantResult;
}

const result = await this.messageService.routeUserMessage({
  message: { text },
  conversation,
  defaultRuntimeProvider,
  cwd,
  model
});

If assistant mode is not active, the message falls through to the runtime path directly. That one decision removed a lot of ambiguity.

Assistant Collaboration: explicit supervision

The assistant path is still useful. It just should not impersonate the runtime path.

When users explicitly invoke CliGate Assistant, they are asking for a different kind of help:

inspect the current state
decide whether to reuse an existing session or start a new one
choose Codex or Claude Code
track approvals, pending questions, failures, and completion
summarize the result back in one reply

That is a supervisor role, not a terminal role.

The mental model we landed on looks like this:

User
  -> CliGate Assistant
    -> delegate to Codex / Claude Code
      -> executor does the concrete work
        -> assistant returns the synthesized result

Once we accepted that boundary, several design decisions became much easier.

Why mixing them felt wrong

Before this split, it was tempting to make the assistant "smart" by default:

detect natural language intent
intercept normal chat
decide whether this looks like a question, a task, or an operation

That approach demos well. It does not age well.

In real usage, developers care less about magic and more about whether the product preserves session continuity. If they are already inside a working runtime, surprise orchestration feels like the system stole the steering wheel.

So we changed the philosophy:

normal messages should stay low-interruption
assistant takeover should be explicit
the assistant should feel collaborative, not invasive

The implementation detail that mattered most

The mode switch is intentionally small.

Inside assistant-core/mode-service.js, we only enter the assistant flow when the conversation is already in assistant mode or the user explicitly triggers it with /cligate.

if (!parsed && !assistantModeActive) {
  return null;
}

That return null is doing a lot of work.

It means the assistant does not get a chance to reinterpret every ordinary message. It only runs when the user has actually asked for it.

There is also a matching escape hatch:

/runtime

That sends the conversation back to direct runtime mode.

This ended up feeling much more respectful than trying to infer intent from every sentence.

What the assistant is actually responsible for

We also had to get stricter about role boundaries in the codebase.

CliGate Assistant is responsible for:

orchestration
observation
approvals and blockers
task tracking
result composition

Codex and Claude Code are still responsible for:

editing files
running commands
browser work
concrete task execution

That sounds obvious, but systems get messy when the assistant starts pretending it is also the executor.

Once we treated the assistant as a supervisor instead of a universal chat brain, the architecture became easier to reason about:

assistant-core owns assistant semantics and state
assistant-agent owns the LLM supervisor loop
agent-* modules remain the execution and runtime substrate

The user-facing result

The product now behaves more like a real teammate and less like a clever router.

If you want to continue the active runtime session, you just continue it.

If you want the system to step back, look at the broader situation, and coordinate work across sessions, you invoke the assistant deliberately.

That separation improved three things immediately:

session continuity became easier to trust
task delegation became easier to explain
mobile and channel workflows made more sense because the assistant could supervise without hijacking every turn

I think more AI tools need this split

A lot of AI products blur "assistant" and "executor" into one conversation because it feels simpler.

I think that simplicity is fake.

As soon as the product has long-running sessions, approvals, retries, resumable work, or multiple executors, you need two modes:

one for staying inside the current runtime
one for asking a supervisor to coordinate work around that runtime

Without that split, the system keeps guessing when it should just listen.

How are you handling this in your own tools?

Repo: github.com/codeking-ai/cligate

"My README Kept Trying to Be the Whole Product Manual. So I Split It Into 3 Layers"

CodeKing — Wed, 13 May 2026 09:14:53 +0000

I kept fixing the same problem in three different places.

Someone would land on the GitHub repo for my local AI gateway and need a fast answer: what is this thing, what does it support, and how do I start it?

Instead, they got the same thing a lot of open-source projects accidentally grow into: a README that wanted to be a landing page, onboarding guide, operator manual, architecture index, and release checklist at the same time.

That works for a while. Then every edit makes it worse.

The failure mode was boring but expensive

The project is CliGate, a local AI gateway that sits between tools like Claude Code, Codex CLI, Gemini CLI, OpenClaw, dashboard chat, channel workflows, and upstream model providers.

As the product surface expanded, the docs expanded with it:

protocol translation details
account pools and API keys
app routing and model mapping
dashboard pages
runtime sessions
Telegram and Feishu channels
local manuals inside the product

The result was predictable:

the GitHub README kept getting longer
first-time users still needed a cleaner path
the in-product assistant needed stable source material
maintainers needed room for operational docs that should never be the first thing a new user reads

So the real problem was not "write more docs."

It was "stop making one document do five jobs."

I ended up with a three-layer docs model

The fix in this repo was to split the content by reader intent instead of by file history.

Now the project has three distinct layers:

a repo-facing README.md
a docs hub under docs/README.md
a lightweight in-product manual served from /manual/

Each one answers a different question.

Layer 1: README is for orientation, not full ownership of every detail

The current README now does the things a repo landing page is actually good at:

explain what CliGate is
show the supported surfaces
give the shortest quick start
point to the right deeper documents

That keeps the first screen useful instead of turning it into a scroll tax.

The structure is intentionally compact:

## Quick Start

### 1. Start CliGate
### 2. Add at least one working credential
### 3. Point your tool to CliGate

And it still gives concrete configuration examples, like Codex pointing at localhost:

chatgpt_base_url = "http://localhost:8081/backend-api/"
openai_base_url = "http://localhost:8081"

That is enough for a reader who wants to know whether the project is relevant before they commit to the rest.

Layer 2: the docs hub is the router for human readers

Once the README stops pretending to be everything, you need a clean next step.

That is what docs/README.md became.

Instead of a random directory listing, it routes by audience:

## By Audience

### New users
### Integrators and operators
### Contributors

This seems obvious, but it fixed a real repo problem for me.

When documentation grows organically, file names make sense to maintainers and almost nobody else. A docs hub changes the question from:

"Which markdown file sounds closest to my problem?"

to:

"What kind of reader am I, and where should I start?"

That is a much better first branch.

Layer 3: the product needed its own short manual

The part I did not want to keep faking was the in-product help path.

When users are already inside the dashboard, they usually do not want the full repository story. They want a short operational guide:

what does this product do
what is the default address
what do I configure first
where in the dashboard should I go next

So CliGate now serves a lightweight manual at /manual/, separate from the repo README and separate from the longer markdown manuals.

The HTML is deliberately focused on quick orientation:

<h2 id="page-title">Understand, configure, and verify CliGate quickly</h2>
<p id="page-subtle">This is the in-product quick manual. For full reference, use the complete product manuals.</p>

And the route layer exposes the source documents explicitly instead of scraping whatever happens to be on disk:

const DOC_FILE_MAP = Object.freeze({
  'README.md': join(process.cwd(), 'docs', 'README.md'),
  'API.md': join(process.cwd(), 'docs', 'API.md'),
  'ARCHITECTURE.md': join(process.cwd(), 'docs', 'ARCHITECTURE.md'),
  'product-manual.en.md': join(process.cwd(), 'docs', 'product-manual.en.md'),
  'product-manual.zh-CN.md': join(process.cwd(), 'docs', 'product-manual.zh-CN.md')
});

That mattered for two reasons:

the UI got a stable set of documents
the product assistant got a cleaner source of truth

The manual files are now doing real product work

This was the architectural shift that made the cleanup worth it.

The docs are not only for GitHub readers anymore. They are also part of the product behavior.

The product manual now carries the user-facing explanation of:

dashboard navigation
routing concepts
CLI configuration
channel workflows
troubleshooting paths

That means the manual has to be shaped for actual usage, not just for repository completeness.

One note in the docs hub says it pretty plainly:

- `product-manual.en.md` and `product-manual.zh-CN.md` are the primary user-facing manuals.
- The product assistant reads from those manual files directly.

Once that became true, letting the README keep absorbing everything stopped making sense.

I also had to accept that maintainers and users need different entry points

This is the trap I keep seeing in open-source docs.

Maintainers are comfortable with:

roadmap files
architecture notes
release checklists
migration plans
incident writeups

New users are not.

If those documents sit beside the real onboarding path without any structure, the repo starts feeling harder than the product.

So the current split lets the project keep maintainers' documents in docs/ without making them the front door. The docs hub explicitly calls that out:

Planning, incident, and roadmap documents remain in this directory for maintainers, but they are not the best entry point for first-time users.

That one sentence removed a lot of ambiguity.

What changed for the actual product

The cleanup was not cosmetic. It changed how the project presents itself in three different contexts:

GitHub readers now get a faster landing path
dashboard users now get a short manual without leaving the product
the product assistant now has clearer manual context to answer from

That last one is easy to underestimate.

If you build an assistant into the product, your documentation stops being passive. It becomes runtime input. Structure starts to matter much more than volume.

The pattern I would reuse

If your open-source project is growing past a single README, I think this split is a better default than endlessly reorganizing one giant file:

README.md for orientation and quick start
docs/README.md as a docs router by audience
an in-product quick manual for operational tasks

Not every project needs all three.

But the moment your docs are serving both repository readers and product users, pretending those are the same audience usually creates a worse experience for both.

If you want to inspect the implementation, the project is here:

CliGate on GitHub

I'm curious how other people are handling this boundary. When did your README stop being a README and start trying to become the whole product manual?

"You Don't Need Matching Model Names to Run AI Coding Tools"

CodeKing — Mon, 11 May 2026 09:53:48 +0000

I ran into a boring problem that kept wasting real time:

my coding tool said gpt-5.5, my provider said the deployment was called something else, and suddenly I was debugging configuration instead of code.

Not model quality. Not prompts. Not token limits.

Just names.

The mismatch that keeps showing up

A lot of AI tooling quietly assumes this:

tool model name == provider model name == provider deployment name

That is a nice fantasy.

It falls apart the moment you use:

Azure OpenAI deployments
provider-specific aliases
internal model mapping
multiple CLI tools that all expect different protocol surfaces

One tool wants gpt-5.5.

Your provider may expose:

model: gpt-5.5
deployment: team-codex-prod

or:

model: claude-sonnet-4-6
upstream target: vertex publisher endpoint

or:

requested model: claude-sonnet-4-6
actual target: gpt-5.4-mini

The names are not the same, and they do not need to be.

The part that annoyed me most

The worst failures were the confusing ones.

The request did not always hard-fail.

Sometimes the tool UI said one thing, the proxy logs said another thing, and the actual upstream target depended on one more layer hidden in provider config.

So you end up asking questions like:

Is this model being mapped by tier?
Is it passed through because it already looks native?
Is the provider overriding it with a deployment name anyway?
Is the UI showing a discovered model that the mapping page cannot even configure yet?

That is too much ceremony for "send this prompt to the model I meant."

What I changed

I stopped letting the tools own the final name resolution.

I put the decision inside a local gateway.

In CliGate, the flow looks more like this:

Claude Code / Codex CLI / Gemini CLI
        |
        v
     localhost
        |
        +-> routing
        +-> model mapping
        +-> provider translation
        +-> deployment override if needed
        |
        v
   actual upstream target

That means the tool can keep asking for the model name it understands, while the gateway decides what the provider should really receive.

Why this matters more on Azure

Azure OpenAI is where this gets obvious fast.

With Azure, there are usually at least two identities in play:

the model name the client thinks it wants
the deployment name Azure actually expects

If your bridge code forwards the requested model directly, but the provider later replaces it with:

model: this.deploymentName || body.model

then the real runtime behavior depends on the deployment config, not the string the CLI showed you.

That is not wrong.

It just means you need better visibility and a better configuration surface.

The fix is not "make every name identical"

I do not think the right answer is forcing every tool, provider, and deployment to share the same label.

That breaks down as soon as:

one provider requires deployment indirection
another provider wants a publisher-specific route
a third provider can only support a tier-mapped equivalent

The better rule is:

let the tool ask for a logical model, and let the gateway resolve the physical target

That is what model mapping should be doing.

The two things I actually needed

After working through this, I realized the UI had to support both:

1. discovered models

If the provider can list available models, show them.

That makes it easy to pick things like gpt-5.5 when the upstream already advertises them.

2. manual model or deployment entry

If the provider uses a deployment name that is not auto-discovered the way the UI expects, I still need to type it manually.

This matters a lot for provider bridges where:

the useful identifier is a deployment, not a catalog model ID
discovery can lag behind reality
the operational name is local to one account or tenant

If the UI only gives me a fixed dropdown, it is pretending the ecosystem is cleaner than it is.

What I changed in my own setup

I updated the model mapping flow so it now:

returns discovered provider models through the model-mapping API
merges discovered models with static mapping candidates
allows manual entry instead of forcing a dropdown-only selection

So the configuration step is no longer:

"pick from whatever the UI happened to preload"

It is:

"pick a discovered model or type the real deployment/model name you actually use"

That is a much more honest interface.

The boring but important engineering lesson

I think a lot of AI infra bugs come from mixing up three different concepts:

display name in the tool
logical model ID used for routing
physical upstream target used at the provider edge

When those three collapse into one string, everything feels simple.

When they do not, you need explicit mapping and explicit logs.

Otherwise you get the classic failure mode:

UI says one thing
logs say one thing
provider actually runs something else

That is not really a model problem.

It is an observability problem.

If you're building local AI tooling, I would strongly recommend this split

Do not let every CLI client carry your provider-specific naming quirks.

Let them speak in the model vocabulary they already know.

Then keep the messy parts in one place:

routing
provider adaptation
deployment overrides
request logs
model mapping UI

That design has held up much better for me than trying to force the whole stack to agree on one shared name.

Repo:

CliGate on GitHub

How are you handling this right now?

Are you keeping model names and deployment names identical on purpose, or are you hiding the mismatch behind a gateway too?