I know what you are thinking. I've just used Amazon Q CLI in the title 3 times. No regrets!
I recently ran into an annoying issue while using the Amazon Q Developer CLI. Every now and then, I'd get this frustrating error message over and over: "Amazon Q is having trouble responding right now.". This errors got more frequent after the Kiro announcement. The CLI would just give up immediately, forcing me to manually retry the command.
The error (technically a ModelOverloadedError) occurs when there's high traffic or resource constraints on AWS's end. There is a GitHub issue already reported.
Easy Solution
Just use /model to switch to Claude 3.7 or 3.5, which may not have capacity constraints. If you still need to use Claude 4, keep reading.
My Solution
I cloned the Amazon Q CLI repo and asked Q CLI to implement a retry mechanism with exponential backoff to automatically handle these temporary overloads:
- Attempts the request up to 10 times before giving up
- Uses exponential backoff starting at 500ms (doubling with each retry)
- Adds random jitter to prevent "thundering herd" problems
- Provides debug logs to track retry attempts
The core implementation looks like this:
if is_model_unavailable {
if attempt > MAX_RETRIES {
debug!("Model Overloaded: Maximum retry attempts ({}) reached", MAX_RETRIES);
return Err(ApiClientError::ModelOverloadedError {
request_id: err
.as_service_error()
.and_then(|err| err.meta().request_id())
.map(|s| s.to_string()),
status_code,
});
}
// Calculate exponential backoff with jitter
let backoff_ms = INITIAL_BACKOFF_MS * 2u64.pow(attempt - 1);
let jitter = rand::random::<u64>() % (backoff_ms / 4); // Add up to 25% jitter
let sleep_duration = Duration::from_millis(backoff_ms + jitter);
debug!(
"Model overloaded. Retrying attempt {}/{} after {}ms",
attempt, MAX_RETRIES, sleep_duration.as_millis()
);
tokio::time::sleep(sleep_duration).await;
continue;
}
I reviewed each line of the modified code, ran all the automated tests and tested the new version myself. The retry mechanism works.
Conclusion
This fix significantly improves the user experience by reducing frustration: Users don't have to manually retry when the service is temporarily overloaded. I've submitted this as a PR to the Amazon Q Developer CLI repository, and I hope it gets merged soon. In the meantime, if you're experiencing this issue, you can clone my repo and build Q cli with the fix. This is not a definitive fix. AWS needs to probably work on their model availability so it doesn't continue to occur, but at least the retry mechanism makes the developer experience a lot better.
Top comments (0)