While working on an open-source contribution to the Gemini CLI, I ran into an interesting edge case involving retryable 429 errors.
The Gemini API can return multiple error signals in a single response — for example:
RetryInfo (retry after X seconds)
QuotaFailure (quota-related metadata)
A regression caused the CLI to prioritise quota exhaustion even when RetryInfo was present, leading to immediate failures instead of retries.
The fix required:
Parsing RetryInfo correctly
Giving retry hints precedence over terminal quota errors
Implementing exponential backoff with jitter
Adding tests to cover mixed error scenarios
The PR was reviewed by Google maintainers and taken for internal validation, which is expected for changes affecting retry and quota behaviour.
If you’re building client tooling for AI APIs, this is a good reminder that error classification logic matters just as much as request logic.
Happy to discuss retry strategies or similar edge cases others have run into.

Top comments (0)