Handling Retry vs Quota Errors Correctly in AI CLI Tools

#api #cli #gemini #opensource

While working on an open-source contribution to the Gemini CLI, I ran into an interesting edge case involving retryable 429 errors.

The Gemini API can return multiple error signals in a single response — for example:

RetryInfo (retry after X seconds)

QuotaFailure (quota-related metadata)

A regression caused the CLI to prioritise quota exhaustion even when RetryInfo was present, leading to immediate failures instead of retries.

The fix required:

Parsing RetryInfo correctly

Giving retry hints precedence over terminal quota errors

Implementing exponential backoff with jitter

Adding tests to cover mixed error scenarios

The PR was reviewed by Google maintainers and taken for internal validation, which is expected for changes affecting retry and quota behaviour.

If you’re building client tooling for AI APIs, this is a good reminder that error classification logic matters just as much as request logic.

Happy to discuss retry strategies or similar edge cases others have run into.

DEV Community