When Your Fallback Model Inherits the Wrong Cooldown

#ai #openclaw #reliability #agents

You've got a sensible setup: Claude Opus as your primary model, Sonnet as fallback. Two auth profiles for redundancy. Six agents running. Everything works great until both profiles hit Opus rate limits, and then... Sonnet doesn't even get to try.

The Setup

From openclaw/openclaw#55941:

{"model": {"primary": "anthropic/claude-opus-4-6", "fallbacks": ["anthropic/claude-sonnet-4-6"]}}

Two auth profiles, both Claude Max. The point of Sonnet as fallback: when Opus gets rate-limited, gracefully degrade.

Except it doesn't.

What Actually Happens

Agent tries profile1 + Opus → 429
Agent tries profile2 + Opus → 529
Gateway puts both profiles in cooldown
Model fallback → tries Sonnet
Gateway: both profiles in cooldown → refuses the call
Sonnet never touches the network

The proof? At the same moment, other agents using those same profiles with Sonnet as primary worked fine. Same profiles, same moment — Sonnet works, but the fallback won't try.

Root Cause

Cooldowns tracked per profile, not per (profile, model):

usageStats[profileId].cooldownUntil = timestamp

Opus rate limit on profile X blocks every model on X — including models with separate rate limit budgets. The exponential backoff (1min → 5min → 25min → 1hr) is hardcoded and profile-global.

The Pattern

Scoping mismatch: failure tracked at coarser granularity than recovery needs. You see this with circuit breakers per-service instead of per-endpoint, retry backoff per-connection instead of per-request-type.

Fix: match failure tracking scope to recovery mechanism scope.