I do not mind a product assistant being wrong because the docs are unclear.
I do mind it being wrong because it silently used the wrong model source.
That was the real problem I hit in my local AI gateway project, CliGate.
The assistant inside the dashboard had a clear job:
- answer product-usage questions
- stay grounded in the manual
- avoid rewriting settings unless the user explicitly asks
But the runtime path behind that assistant was still too fuzzy. In practice, it could depend on whichever account or API key the broader system happened to resolve first.
That is fine for generic chat.
It is not fine for a product assistant that is supposed to be predictable.
The failure mode was subtle
I already had routing. I already had accounts, API keys, and model mapping. I already had a settings surface.
The annoying part was that the assistant itself still behaved too much like "just another consumer of the default pool."
That created a few bad outcomes:
- the assistant could drift across providers without the user realizing it
- clearing a binding could get undone by old migration behavior
- one flaky credential could make the whole assistant feel unreliable
- the UI could not answer a simple question like: what is the assistant actually bound to right now?
The bug was not one broken request.
The bug was that the assistant did not have a first-class routing identity.
I stopped thinking in terms of "credential" and switched to "model source"
This is the design change that made the rest of the work much easier.
I did not actually want to bind the assistant to a vague source type like "OpenAI keys" or "Claude account."
I wanted to bind it to a concrete model source:
{
"type": "api-key",
"id": "key_x",
"model": "gpt-5.4"
}
That is why the new config path in CliGate moved toward boundModelSource instead of treating everything as a loose boundCredential.
The internal runtime config now normalizes around that field:
boundModelSource: stored.boundModelSource || stored.boundCredential || null,
boundCredential: stored.boundModelSource || stored.boundCredential || null,
fallbacks: Array.isArray(stored.fallbacks) ? stored.fallbacks : [],
The compatibility alias still exists, but the meaning changed. The assistant is no longer just "attached to a credential." It is attached to a specific source plus an optional model.
That sounds like a naming cleanup. It was actually a control cleanup.
I also needed a way to say "yes, the user configured this on purpose"
One of the uglier problems was legacy migration.
Older assistant settings had source toggles. Newer settings have explicit bindings. If the user cleared the binding, I did not want old migration logic to recreate it on the next restart just because a legacy flag still existed somewhere.
So I added a small but important flag:
"bindingConfigured": true
That flag means:
- the user has explicitly configured assistant binding state
- even if the current binding is
null - do not auto-migrate old sources back into place
This was one of those changes that looks boring in a diff and saves a lot of operator confusion later.
Without it, "clear binding" is not a real action. It is just a temporary suggestion.
The assistant needed an ordered chain, not one brittle primary
Once the assistant had a proper primary binding, the next obvious problem showed up:
what happens when that source is deleted, disabled, rate-limited, or just temporarily broken?
I did not want the answer to be:
"assistant is down."
So the assistant runtime now builds a real chain:
if (config.boundModelSource || config.boundCredential) {
chain.push(config.boundModelSource || config.boundCredential);
}
if (Array.isArray(config.fallbacks)) {
for (const entry of config.fallbacks) {
if (entry && typeof entry === 'object' && entry.type && entry.id) {
chain.push(entry);
}
}
}
That is simple on purpose.
The first tier is the assistant's intended home. The later tiers are not magic discovery. They are explicit ordered fallbacks the user can inspect in the UI.
That matters because fallback behavior should be explainable.
If an assistant changes models under pressure, I want to know exactly why.
A circuit breaker made the assistant feel much less random
Fallback chains are not enough if you keep retrying a dead tier over and over.
So the assistant LLM client keeps breaker state per tier and skips sources that are currently in cooldown:
for (const descriptor of chain) {
const tierKey = tierKeyFor(descriptor);
if (this._breaker.shouldSkip(tierKey)) continue;
const candidate = await resolveCredential(descriptor, {
defaultChatGptModel: this.defaultChatGptModel,
defaultClaudeModel: this.defaultClaudeModel
});
if (!candidate) continue;
candidates.push({ ...candidate, tierKey });
}
And when a call fails, the tier records failure instead of pretending the error was just bad luck:
const breakerState = this._breaker.recordFailure(source.tierKey);
logger.warn(`[Supervisor] tier failed | tier=${source.tierKey} | breaker=${breakerState}`);
That changed the experience more than I expected.
Before, the assistant could feel inconsistent in a way users interpret as "the prompt changed" or "the model got weird."
After this change, the behavior became much more operational:
- try the primary source
- skip tripped tiers
- fall through to explicit backups
- expose the health state in the dashboard
That is a better failure story for a product surface.
The UI finally has something honest to show
This was another reason I wanted the routing chain to be explicit.
Once the backend exposes:
- the current primary
- ordered fallbacks
- resolved source
- breaker state
- last used tier
the settings page can stop being a dead form and start being an inspection tool.
The assistant page now has controls for:
- primary model source
- per-tier model selection
- up to three fallbacks
- breaker threshold and cooldown
- test-binding checks
- tier health and last-used status
That is exactly the kind of visibility I wanted when debugging "why did the assistant answer from this provider instead of that one?"
I did not want the assistant to silently test with live requests
There is a small route detail here that I like because it keeps the UI honest.
The binding test endpoint validates whether a descriptor resolves, but it does not fire an actual LLM request:
const result = await describeBinding({ type: body.type, id: body.id });
return res.json({ success: result.ok, ...result });
That means the user gets a fast answer to:
"is this binding even real?"
without turning the settings screen into an accidental prompt runner.
It is a small boundary, but product assistants need that kind of boundary.
The part I trust most is the migration and route coverage
I can write all the assistant architecture docs I want, but the thing that makes me trust this change is the route-level test coverage.
For example, there are tests that pin the new primary field:
assert.deepEqual(res._body.assistantAgent.boundModelSource, {
type: 'api-key',
id: 'key-primary',
model: 'gpt-5.4'
});
And tests that make sure clearing bindings is respected:
assert.equal(res._body.assistantAgent.boundModelSource, null);
assert.equal(res._body.assistantAgent.boundCredential, null);
Those are the kinds of tests that prevent a future "helpful migration" from quietly breaking the operator's intent again.
What changed in how I think about product assistants
I used to think the important part was the prompt and the docs grounding.
Those matter.
But once the assistant becomes part of the product, routing discipline matters just as much.
If the assistant is meant to be:
- predictable
- inspectable
- recoverable
- configurable without guesswork
then it cannot just borrow whatever account or API key happened to win a broader routing race.
It needs its own routing chain.
The pattern I would reuse
If you are adding a product assistant to an existing app with multiple model sources, I think this is the safer progression:
- give the assistant its own explicit primary binding
- bind to a concrete source plus model, not just a source type
- mark explicit user configuration so legacy migration cannot override it
- add ordered fallbacks
- add breaker state so failures do not loop forever
- expose the whole chain in the UI
That is a lot less glamorous than "ship an assistant."
But it is the difference between a demo assistant and one that operators can actually live with.
If you want to inspect the implementation, the project is here:
I am curious how other people are handling this. Does your product assistant have its own routing identity, or is it still borrowing the same model path as ordinary chat?
Top comments (0)