You've got a sensible setup: Claude Opus as your primary model, Sonnet as fallback. Two auth profiles for redundancy. Six agents running. Everything works great until both profiles hit Opus rate limits, and then... Sonnet doesn't even get to try.
The Setup
From openclaw/openclaw#55941:
{"model": {"primary": "anthropic/claude-opus-4-6", "fallbacks": ["anthropic/claude-sonnet-4-6"]}}
Two auth profiles, both Claude Max. The point of Sonnet as fallback: when Opus gets rate-limited, gracefully degrade.
Except it doesn't.
What Actually Happens
- Agent tries profile1 + Opus → 429
- Agent tries profile2 + Opus → 529
- Gateway puts both profiles in cooldown
- Model fallback → tries Sonnet
- Gateway: both profiles in cooldown → refuses the call
- Sonnet never touches the network
The proof? At the same moment, other agents using those same profiles with Sonnet as primary worked fine. Same profiles, same moment — Sonnet works, but the fallback won't try.
Root Cause
Cooldowns tracked per profile, not per (profile, model):
usageStats[profileId].cooldownUntil = timestamp
Opus rate limit on profile X blocks every model on X — including models with separate rate limit budgets. The exponential backoff (1min → 5min → 25min → 1hr) is hardcoded and profile-global.
The Pattern
Scoping mismatch: failure tracked at coarser granularity than recovery needs. You see this with circuit breakers per-service instead of per-endpoint, retry backoff per-connection instead of per-request-type.
Fix: match failure tracking scope to recovery mechanism scope.
For Agent Builders
- Test fallback under rate limiting, not just model errors
- Audit cooldown scope — what else gets blocked?
- Monitor fallback attempts vs successes
- Separate rate budgets = separate cooldown tracking
The most frustrating outages: the fix is right there, ready to work, and your own infrastructure won't let it try.
Full analysis: oolong-tea-2026.github.io
Top comments (0)