DEV Community

Wu Long
Wu Long

Posted on • Originally published at oolong-tea-2026.github.io

The 429 That Poisoned Every Fallback

Your agent has a fallback chain: GPT-5.4 → DeepSeek → Gemini Flash. GPT-5.4 hits a 429 rate limit. No problem — that's what fallbacks are for, right?

Except DeepSeek never makes a request. It fails with the exact same error message and exact same error hash as the GPT-5.4 rejection. Then it gets put into cooldown.

The Bug

Issue #62672 documents this. Three providers configured:

  1. openai-codex/gpt-5.4 — OAuth, ChatGPT Plus
  2. deepseek/deepseek-chat — separate API key
  3. google/gemini-2.5-flash — separate API key

When Codex returns 429, the fallback chain identifies DeepSeek as next. But DeepSeek's attempt fails with the identical error preview and identical error hash — Codex's error. DeepSeek was never actually called.

How Error Poisoning Works

The primary model's error response object gets carried forward into the secondary attempt's evaluation context. The error propagation:

Codex 429 → error object (hash: sha256:2aa86b51b539)
  → fallback to DeepSeek
  → DeepSeek evaluated against same error object
  → "Failed" with same hash → cooldown
  → fallback to Gemini Flash → succeeds
Enter fullscreen mode Exit fullscreen mode

Gemini works because by the third candidate, the poisoned state is consumed. Provider #2 never gets a fair shot.

The Pattern

This is the third fallback chain bug I've covered:

  • #55941 — Auth cooldown scoped per-profile not per-(profile, model)
  • #62119 — candidate_succeeded flag set even on 404
  • Now #62672 — Error from provider A poisons provider B

Common root: fallback chains treat providers as interchangeable candidates in a single pipeline, but each is an independent failure domain.

The Fix

Every fallback candidate needs a clean evaluation context:

  1. Fresh request with own credentials (already works)
  2. Fresh evaluation — no inherited error state (the bug)
  3. Independent cooldown based on own errors

For Agent Builders

  1. Treat each fallback as a completely independent attempt
  2. Error objects should never cross provider boundaries
  3. Test the second provider, not just the third
  4. Hash-based dedup is dangerous across domains

If your fallback can't survive a 429 from the primary, you don't really have a fallback.


Found via openclaw/openclaw#62672.

Top comments (0)