An unattended agent can turn one 429 into a retry storm.
It can turn one timeout into a duplicate write.
It can turn one fallback into unapproved provider spend.
The production boundary is not "does the client retry?" It is whether the route can prove when it must stop.
Fast answer
- Retries are not a transport detail once an agent can call tools in a loop. They are spend authority, provider pressure, and user-visible side effects repeated by software that may not know when to stop.
- A production MCP route needs a retry budget before launch: max attempts, wall-clock ceiling, provider quota owner, token/cost cap, idempotency rule, backoff shape, and the exact denial returned when the budget is exhausted.
- Rate-limit proof is not a happy-path 200. It is a forced 429/503, timeout, duplicate request, partial side effect, and exhausted-budget fixture through the same route card and receipt path.
- If the trace cannot explain why the agent stopped retrying, who owned the budget, whether the call was safe to replay, and which recovery action is allowed next, the route is not ready for unattended repeat use.
Operator rule
A clean stop is part of the product.
Agents do not get credit for eventually succeeding if the route spent the user's quota blindly first. The receipt has to show why the system retried, why it stopped, and what safe recovery remains.
The production checklist
1. Route-level retry budget
Set max attempts, max elapsed time, max queued delay, max tokens or dollars, and max provider calls per route. Do not inherit a global retry policy blindly across tools with different side effects.
2. Quota owner and lane
Name the budget owner: user, tenant, workspace, Rhumb-managed lane, customer key, provider account, or explicit test quota. The receipt should show which lane was charged or protected.
3. Idempotency and side-effect class
Separate read, search, estimate, create, update, send, delete, purchase, and external-message calls. Only replay when the route has an idempotency key or a verified no-side-effect class.
4. Backoff and jitter evidence
Record the Retry-After header, provider reset time, chosen delay, jitter range, queue position, and whether the model is allowed to ask for a manual recovery step instead of hammering the provider.
5. Duplicate and partial-result fixture
Force a timeout after provider acceptance, duplicate the same request id, and verify the second call resolves to the original receipt or a typed duplicate denial instead of repeating the side effect.
6. Exhaustion denial
When the budget is spent, return a typed denial with attempts, elapsed time, quota owner, protected provider, next retry window, and safe recovery path. Do not let the model improvise another route around the budget.
Failure fixtures
Do not promote a route until the bad timing cases have receipts.
Provider 429
Expected: Respect Retry-After or reset metadata, stop at route budget, and receipt the protected quota owner.
Provider 503 / network timeout
Expected: Retry only idempotent or explicitly replay-safe classes; include backoff decision, elapsed ceiling, and final recovery hint.
Timeout after accepted write
Expected: Use idempotency key or status lookup before replay. A second side effect is a failed gate, even if the final response is 200.
Agent loop repeats same ask
Expected: Collapse duplicate intent into one receipt or deny after budget exhaustion; do not multiply provider calls because the planner rephrased the task.
Fallback provider route
Expected: Require a separate budget owner, data-use rule, credential lane, and receipt. Fallback is not a hidden retry path.
Trace evidence
The retry receipt should make the loop boring to audit.
Rate-limit and timeout handling only become operator-grade when every attempt is reconstructable. The evidence should identify the protected budget, the replay decision, the provider response, and the recovery path without depending on the model's explanation after the fact.
The receipt should include:
- route id and tool call id
- caller / tenant / workspace
- operation class and side-effect class
- quota owner and credential lane
- provider account / capability id
- attempt number and max attempts
- elapsed time and wall-clock ceiling
- token, dollar, and provider-call budget
- provider status, Retry-After, and reset metadata
- idempotency key or replay decision
- backoff delay and jitter range
- duplicate / partial-result check
- policy decision and denial code
- receipt id and allowed recovery action
Copy-paste route card
Budget the route before the loop runs.
MCP route:
Caller / tenant:
Operation and side-effect class:
Quota owner:
Credential lane:
Max attempts:
Max elapsed time:
Token / dollar / provider-call cap:
Retry-after / backoff rule:
Idempotency key or replay guard:
Forbidden fallback routes:
Exhaustion denial code:
Receipt fields:
Common misreads
Retry systems usually fail in predictable ways:
- Treating a 429 as a temporary nuisance instead of a budget decision that protects a user, tenant, provider account, or managed lane.
- Using one global retry middleware for read-only search, email sends, calendar writes, purchases, and payment calls.
- Logging final success while losing the evidence that three provider calls, a fallback, and a timeout happened first.
- Letting the model route around a rate limit through a second provider without a separate budget and data-use decision.
- Calling a tool idempotent because the endpoint name says
create_or_updatewhile the provider does not accept a stable idempotency key. - Counting token budget but not provider-call budget, even though the provider quota is the scarce production resource.
Related Rhumb guides
- Designing Agent Fleets That Survive Rate Limits
- MCP Observability, Logging, Auditing, and Debugging
- MCP Route Hardening Checklist
If you want the owned version with the route-hardening CTA, it is here: https://rhumb.dev/blog/mcp-retry-rate-limit-budget-checklist
Top comments (0)