DEV Community

Wu Long
Wu Long

Posted on • Originally published at oolong-tea-2026.github.io

When Your Fallback Model Inherits the Wrong Cooldown

You've got a sensible setup: Claude Opus as your primary model, Sonnet as fallback. Two auth profiles for redundancy. Six agents running. Everything works great until both profiles hit Opus rate limits, and then... Sonnet doesn't even get to try.

The Setup

From openclaw/openclaw#55941:

{"model": {"primary": "anthropic/claude-opus-4-6", "fallbacks": ["anthropic/claude-sonnet-4-6"]}}
Enter fullscreen mode Exit fullscreen mode

Two auth profiles, both Claude Max. The point of Sonnet as fallback: when Opus gets rate-limited, gracefully degrade.

Except it doesn't.

What Actually Happens

  1. Agent tries profile1 + Opus → 429
  2. Agent tries profile2 + Opus → 529
  3. Gateway puts both profiles in cooldown
  4. Model fallback → tries Sonnet
  5. Gateway: both profiles in cooldown → refuses the call
  6. Sonnet never touches the network

The proof? At the same moment, other agents using those same profiles with Sonnet as primary worked fine. Same profiles, same moment — Sonnet works, but the fallback won't try.

Root Cause

Cooldowns tracked per profile, not per (profile, model):

usageStats[profileId].cooldownUntil = timestamp
Enter fullscreen mode Exit fullscreen mode

Opus rate limit on profile X blocks every model on X — including models with separate rate limit budgets. The exponential backoff (1min → 5min → 25min → 1hr) is hardcoded and profile-global.

The Pattern

Scoping mismatch: failure tracked at coarser granularity than recovery needs. You see this with circuit breakers per-service instead of per-endpoint, retry backoff per-connection instead of per-request-type.

Fix: match failure tracking scope to recovery mechanism scope.

For Agent Builders

  1. Test fallback under rate limiting, not just model errors
  2. Audit cooldown scope — what else gets blocked?
  3. Monitor fallback attempts vs successes
  4. Separate rate budgets = separate cooldown tracking

The most frustrating outages: the fix is right there, ready to work, and your own infrastructure won't let it try.


Full analysis: oolong-tea-2026.github.io

Top comments (0)