I thought my web chat was the simplest surface in the whole product.
Telegram, Feishu, and DingTalk were the complicated ones. Web chat was just the dashboard. Same browser, same server, same app. What could possibly go wrong?
A lot, apparently.
The bug looked random from the UI
A task would start from the web chat UI just fine.
The runtime session existed. The conversation existed. The task existed. The logs looked healthy enough.
And then the delivery pipeline tried to send a follow-up update back into the conversation and got:
conversation_not_found
Which made no sense, because the conversation definitely existed. I had just used it.
This is the kind of bug that wastes time because every individual subsystem looks half-correct.
The real problem: I treated web chat like a page, not a channel
The architecture in CliGate already had a channel model.
Telegram is a channel. Feishu is a channel. DingTalk is a channel.
Those inbound messages go through the same supervision and delivery machinery:
- conversation store
- scheduler
- delivery sender
- assistant orchestration
- runtime session binding
But web chat had slowly drifted into a special-case path.
That felt harmless at first. Web chat lived inside the same app, so it was easy to give it a little custom state and a few convenience wrappers.
That was the mistake.
What actually broke
The old version of chat-ui/conversation-store.js exported its own store instance.
Meanwhile, the delivery and orchestration path used the shared channel conversation store.
So both sides were reading and writing "conversations," but not the same in-memory array.
That meant:
- the chat UI could create a conversation
- the route handler could see it
- the runtime could bind to it
- but the scheduler could still fail to find it later
The comments in the fix say it more plainly than I can:
// chat-ui and agent-channels each held a SEPARATE in-memory
// `conversations` array, even though both wrote to the same JSON file on disk.
// After server start, a chat-ui conversation created at runtime was visible to
// chat-ui-route but NOT to message-service, so scheduler deliveries hit
// `conversation_not_found` and silently dropped notifications.
That is exactly the sort of bug you get when a "small UI shortcut" quietly forks your domain model.
The fix was not complicated
I did not need a new abstraction.
I needed one source of truth.
Instead of exporting a dedicated chat-ui conversation store in production, I attached chat-specific helpers to the shared singleton used by the channel system:
installChatUiHelpers(agentChannelConversationStore);
export const chatUiConversationStore = agentChannelConversationStore;
That one change matters more than it looks.
Now web chat is not pretending to be adjacent to the channel system. It is part of the channel system.
Why this changed more than delivery
Once I stopped treating web chat as a special page, a lot of other decisions became cleaner.
A chat-ui conversation now behaves like a real peer of the other channels:
- it has the same conversation identity model
- it uses the same assistant delivery state
- it flows through the same runtime binding logic
- it can receive scheduler-driven updates without weird bridging code
That matters because a multi-surface assistant only stays sane if all entry points agree on what a conversation is.
If one surface has its own special rules, you do not have one product anymore. You have one product plus one exception that keeps leaking.
The other important fix: seed assistant mode from the start
There was a second detail hidden in the same file.
New chat-ui conversations now start with assistant control mode already set:
assistantCore: buildAssistantCoreDeliveryState(
existingAssistantCore,
{ controlMode: CONVERSATION_ASSISTANT_CONTROL_MODE.ASSISTANT }
)
That matters because web chat should enter the same top-level assistant orchestration path as the messaging channels.
If the first message from the web UI skips that and goes straight to the bound runtime, you get behavioral drift:
- web chat behaves one way
- Telegram behaves another way
- Feishu behaves another way
Then every bug becomes impossible to reason about because the surfaces are no longer comparable.
The test I actually wanted
I have learned to distrust fixes like this unless there is a test that proves the behavioral contract.
The right question was not "does chat-ui store still work?"
The right question was:
does a chat-ui conversation participate in assistant behavior like a real channel?
That is why the surrounding tests focus on assistant-mode behavior and persisted conversation semantics instead of only checking helper methods in isolation.
The implementation detail was a store instance bug.
The product bug was channel inconsistency.
What I learned from this
When you build multi-channel agent systems, the browser UI is seductive.
It feels local. It feels simple. It feels close enough to the app that you can justify giving it custom flow control, custom state, or custom routing.
That instinct is expensive.
If the browser chat can start tasks, receive async updates, carry conversation identity, and interact with the same supervisor as your mobile or messaging surfaces, then it is not "just a page."
It is a channel.
And if you do not model it that way, the architecture will eventually make you pay for the lie.
Are you treating your web chat as a first-class channel, or as a special case that has not failed loudly yet?
Top comments (0)