CodeKing

Posted on May 14

"I Stopped Letting My AI Assistant Hijack Every Message"

#ai #webdev #javascript #discuss

I kept running into the same problem while building AI tooling: the smarter the assistant looked, the less predictable the product felt.

You send a message because you want to continue the current coding session. The system decides you probably meant "start a new task," rewrites the intent, and suddenly you are no longer talking to the runtime you thought you were using.

That sounds small until you try to use it every day.

The problem was not model quality

The failure mode had very little to do with whether the underlying executor was Codex or Claude Code.

The real problem was control.

In a coding workflow, there are at least two very different intents:

I want to keep talking to the current runtime session.
I want a higher-level assistant to look at the whole situation, choose what to do, and coordinate work for me.

If those two paths share the same default entry point, the product starts guessing too much.

That guess is expensive. It changes session continuity, interrupts the mental model, and makes users wonder whether the system is actually listening or just pattern-matching.

What we changed in CliGate

CliGate is our local AI gateway for Claude Code, Codex CLI, Gemini CLI, OpenClaw, web chat, and channel-based workflows.

Instead of treating "assistant" as the universal default, we split the interaction model into two explicit modes:

Direct Runtime
Assistant Collaboration

That sounds like a UI detail, but it changed the architecture.

Direct Runtime: boring on purpose

In direct runtime mode, the rule is simple:

Your message goes to the current runtime path.

No intent interception. No surprise supervision layer. No "maybe I should help by doing something else first."

That path matters because stable tooling feels boring in the best way. If a user is already inside an active Codex or Claude Code session, the next message should continue that session unless they clearly ask for something different.

In our code, that distinction is enforced before the regular routing path kicks in:

const assistantResult = await this.assistantModeService.maybeHandleMessage({
  conversation,
  text,
  defaultRuntimeProvider,
  cwd,
  model
});

if (assistantResult) {
  return assistantResult;
}

const result = await this.messageService.routeUserMessage({
  message: { text },
  conversation,
  defaultRuntimeProvider,
  cwd,
  model
});

If assistant mode is not active, the message falls through to the runtime path directly. That one decision removed a lot of ambiguity.

Assistant Collaboration: explicit supervision

The assistant path is still useful. It just should not impersonate the runtime path.

When users explicitly invoke CliGate Assistant, they are asking for a different kind of help:

inspect the current state
decide whether to reuse an existing session or start a new one
choose Codex or Claude Code
track approvals, pending questions, failures, and completion
summarize the result back in one reply

That is a supervisor role, not a terminal role.

The mental model we landed on looks like this:

User
  -> CliGate Assistant
    -> delegate to Codex / Claude Code
      -> executor does the concrete work
        -> assistant returns the synthesized result

Once we accepted that boundary, several design decisions became much easier.

Why mixing them felt wrong

Before this split, it was tempting to make the assistant "smart" by default:

detect natural language intent
intercept normal chat
decide whether this looks like a question, a task, or an operation

That approach demos well. It does not age well.

In real usage, developers care less about magic and more about whether the product preserves session continuity. If they are already inside a working runtime, surprise orchestration feels like the system stole the steering wheel.

So we changed the philosophy:

normal messages should stay low-interruption
assistant takeover should be explicit
the assistant should feel collaborative, not invasive

The implementation detail that mattered most

The mode switch is intentionally small.

Inside assistant-core/mode-service.js, we only enter the assistant flow when the conversation is already in assistant mode or the user explicitly triggers it with /cligate.

if (!parsed && !assistantModeActive) {
  return null;
}

That return null is doing a lot of work.

It means the assistant does not get a chance to reinterpret every ordinary message. It only runs when the user has actually asked for it.

There is also a matching escape hatch:

/runtime

That sends the conversation back to direct runtime mode.

This ended up feeling much more respectful than trying to infer intent from every sentence.

What the assistant is actually responsible for

We also had to get stricter about role boundaries in the codebase.

CliGate Assistant is responsible for:

orchestration
observation
approvals and blockers
task tracking
result composition

Codex and Claude Code are still responsible for:

editing files
running commands
browser work
concrete task execution

That sounds obvious, but systems get messy when the assistant starts pretending it is also the executor.

Once we treated the assistant as a supervisor instead of a universal chat brain, the architecture became easier to reason about:

assistant-core owns assistant semantics and state
assistant-agent owns the LLM supervisor loop
agent-* modules remain the execution and runtime substrate

The user-facing result

The product now behaves more like a real teammate and less like a clever router.

If you want to continue the active runtime session, you just continue it.

If you want the system to step back, look at the broader situation, and coordinate work across sessions, you invoke the assistant deliberately.

That separation improved three things immediately:

session continuity became easier to trust
task delegation became easier to explain
mobile and channel workflows made more sense because the assistant could supervise without hijacking every turn

I think more AI tools need this split

A lot of AI products blur "assistant" and "executor" into one conversation because it feels simpler.

I think that simplicity is fake.

As soon as the product has long-running sessions, approvals, retries, resumable work, or multiple executors, you need two modes:

one for staying inside the current runtime
one for asking a supervisor to coordinate work around that runtime

Without that split, the system keeps guessing when it should just listen.

How are you handling this in your own tools?

Repo: github.com/codeking-ai/cligate

DEV Community