CodeKing

Posted on May 22

"My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline"

#ai #javascript #webdev #opensource

I thought my web chat was the simplest surface in the whole product.

Telegram, Feishu, and DingTalk were the complicated ones. Web chat was just the dashboard. Same browser, same server, same app. What could possibly go wrong?

A lot, apparently.

The bug looked random from the UI

A task would start from the web chat UI just fine.

The runtime session existed. The conversation existed. The task existed. The logs looked healthy enough.

And then the delivery pipeline tried to send a follow-up update back into the conversation and got:

conversation_not_found

Which made no sense, because the conversation definitely existed. I had just used it.

This is the kind of bug that wastes time because every individual subsystem looks half-correct.

The real problem: I treated web chat like a page, not a channel

The architecture in CliGate already had a channel model.

Telegram is a channel. Feishu is a channel. DingTalk is a channel.

Those inbound messages go through the same supervision and delivery machinery:

conversation store
scheduler
delivery sender
assistant orchestration
runtime session binding

But web chat had slowly drifted into a special-case path.

That felt harmless at first. Web chat lived inside the same app, so it was easy to give it a little custom state and a few convenience wrappers.

That was the mistake.

What actually broke

The old version of chat-ui/conversation-store.js exported its own store instance.

Meanwhile, the delivery and orchestration path used the shared channel conversation store.

So both sides were reading and writing "conversations," but not the same in-memory array.

That meant:

the chat UI could create a conversation
the route handler could see it
the runtime could bind to it
but the scheduler could still fail to find it later

The comments in the fix say it more plainly than I can:

// chat-ui and agent-channels each held a SEPARATE in-memory
// `conversations` array, even though both wrote to the same JSON file on disk.
// After server start, a chat-ui conversation created at runtime was visible to
// chat-ui-route but NOT to message-service, so scheduler deliveries hit
// `conversation_not_found` and silently dropped notifications.

That is exactly the sort of bug you get when a "small UI shortcut" quietly forks your domain model.

The fix was not complicated

I did not need a new abstraction.

I needed one source of truth.

Instead of exporting a dedicated chat-ui conversation store in production, I attached chat-specific helpers to the shared singleton used by the channel system:

installChatUiHelpers(agentChannelConversationStore);

export const chatUiConversationStore = agentChannelConversationStore;

That one change matters more than it looks.

Now web chat is not pretending to be adjacent to the channel system. It is part of the channel system.

Why this changed more than delivery

Once I stopped treating web chat as a special page, a lot of other decisions became cleaner.

A chat-ui conversation now behaves like a real peer of the other channels:

it has the same conversation identity model
it uses the same assistant delivery state
it flows through the same runtime binding logic
it can receive scheduler-driven updates without weird bridging code

That matters because a multi-surface assistant only stays sane if all entry points agree on what a conversation is.

If one surface has its own special rules, you do not have one product anymore. You have one product plus one exception that keeps leaking.

The other important fix: seed assistant mode from the start

There was a second detail hidden in the same file.

New chat-ui conversations now start with assistant control mode already set:

assistantCore: buildAssistantCoreDeliveryState(
  existingAssistantCore,
  { controlMode: CONVERSATION_ASSISTANT_CONTROL_MODE.ASSISTANT }
)

That matters because web chat should enter the same top-level assistant orchestration path as the messaging channels.

If the first message from the web UI skips that and goes straight to the bound runtime, you get behavioral drift:

web chat behaves one way
Telegram behaves another way
Feishu behaves another way

Then every bug becomes impossible to reason about because the surfaces are no longer comparable.

The test I actually wanted

I have learned to distrust fixes like this unless there is a test that proves the behavioral contract.

The right question was not "does chat-ui store still work?"

The right question was:

does a chat-ui conversation participate in assistant behavior like a real channel?

That is why the surrounding tests focus on assistant-mode behavior and persisted conversation semantics instead of only checking helper methods in isolation.

The implementation detail was a store instance bug.

The product bug was channel inconsistency.

What I learned from this

When you build multi-channel agent systems, the browser UI is seductive.

It feels local. It feels simple. It feels close enough to the app that you can justify giving it custom flow control, custom state, or custom routing.

That instinct is expensive.

If the browser chat can start tasks, receive async updates, carry conversation identity, and interact with the same supervisor as your mobile or messaging surfaces, then it is not "just a page."

It is a channel.

And if you do not model it that way, the architecture will eventually make you pay for the lie.

Are you treating your web chat as a first-class channel, or as a special case that has not failed loudly yet?

Repo: github.com/codeking-ai/cligate