Tuesday afternoon, every autonomous cycle in my agent started returning the same error:
[AGENT] Cycle failed: 404 No endpoints found for model: google/gemma-2-9b-it:free
The model hadn't changed in my config. The provider hadn't gone down. The endpoint just... wasn't there anymore. OpenRouter had retired the :free SKU mid-week — no notification, no deprecation window, just gone. Every background classification, every briefing generation, every proactive scan started failing in the same way.
I had a fallback. That was the embarrassing part.
The fallback that didn't fall back
My createCompletion() wrapper had been catching the documented provider failure modes for months:
-
402 insufficient_credits→ walk to next provider -
403 daily_quota_exceeded→ walk to next provider -
429 rate_limited→ backoff + retry
What it didn't catch: "the model you asked for doesn't exist anymore." A 404 No endpoints found propagated as a generic error and killed the cycle. The fallback chain never even got consulted because nothing in the existing branches matched.
The mental model was wrong. I'd been treating the model catalog as fixed configuration — something you set once and forget. In reality it's upstream state that can mutate at any moment, just like any other dependency. The retirement was a feature of the provider's catalog management, not a bug.
The fix: walk the free-model chain on retirement signals
The actual patch was short. Two PRs:
ts
// Before: only walked on credit/quota/rate failures
if (isCreditError(err) || isKeyLimitError(err)) {
return walkFallbackChain(...);
}
// After: also walk when the model itself is gone
if (isModelUnavailableError(err)) {
markModelUnavailable(model);
return walkFallbackChain(...);
}
isModelUnavailableError matches on:
HTTP 404 with No endpoints found in body
HTTP 400 with model_not_found code
Anything else the provider emits when the SKU is gone
markModelUnavailable puts the model on a 24h cooldown so the next cycle doesn't try it again immediately. When the catalog refreshes (providers add new SKUs all the time too), the cooldown expires and we retry.
The fallback chain itself is per-provider:
const OPENROUTER_FALLBACK_CHAIN = [
'meta-llama/llama-3.3-70b-instruct:free',
'google/gemma-2-9b-it:free',
'mistralai/mistral-7b-instruct:free',
'qwen/qwen-2.5-7b-instruct:free',
];
When one entry 404s, we walk to the next. When all of them fail, we fail over to the secondary provider (Gemini direct), which has its own chain. Only when every chain across every provider has been exhausted does the agent give up and surface AllProvidersExhaustedError to the user.
What I should have done from day 1
Three rules I'm internalizing:
1. The upstream catalog is mutable. Hardcoding a single model ID is the same antipattern as hardcoding a single CDN URL. Always have a list. Always make the list cheap to rotate.
2. Distinguish "this model is unavailable" from "the provider is unavailable." They're different failures with different recovery paths. Treating them the same way means you either over-rotate (give up the provider when only one model is gone) or under-rotate (give up entirely when the provider is fine).
3. Cooldowns, not blacklists. When a model disappears, don't kill it forever. Put it on a window. Providers add models back, or you might be hitting a transient 404. A 24h cooldown is much friendlier than a permanent deny-list that requires a code change to undo.
Why this matters beyond one provider
If you're running an agent in production, your model isn't your only upstream dependency:
Vendor's catalog can change
Pricing can change (:free → :paid is a real failure mode)
Rate-limit policies can change
Authentication schemes can change (Google's AQ.-prefix keys rejected by their own OpenAI-compat endpoint is a fun one — I had to write a native adapter for it)
The pattern is the same: treat every assumption about the upstream as a potential dynamic value, and make the recovery path the default, not the exception.
Agents that survive in prod have failover chains, cooldown windows, and degraded modes built in from the start. Not because the upstream is unreliable — because the upstream is alive, and alive things change.
I've been writing about Klorn, an open-source attention firewall for Gmail, where this kind of failure mode hits constantly because the agent runs continuously. Repo: github.com/k08200/klorn · Doctrine: deterministic-floor.md.
If you've shipped agents to prod, what other upstream-mutation failure modes have caught you off-guard?
Top comments (1)
Great write-up. The model-unavailable vs provider-unavailable distinction is spot-on — most implementations conflate them. One thing I'd add: model IDs don't just disappear, they also get silently renamed or migrated. Having a model alias/mapping layer between your agent and the upstream can catch both retirement AND rename events before they hit your fallback chain.