Rhumb

Posted on May 17 • Originally published at rhumb.dev

MCP Retry and Rate-Limit Budget Checklist

#mcp #api #programming #ai

An unattended agent can turn one 429 into a retry storm.

It can turn one timeout into a duplicate write.

It can turn one fallback into unapproved provider spend.

The production boundary is not "does the client retry?" It is whether the route can prove when it must stop.

Fast answer

Retries are not a transport detail once an agent can call tools in a loop. They are spend authority, provider pressure, and user-visible side effects repeated by software that may not know when to stop.
A production MCP route needs a retry budget before launch: max attempts, wall-clock ceiling, provider quota owner, token/cost cap, idempotency rule, backoff shape, and the exact denial returned when the budget is exhausted.
Rate-limit proof is not a happy-path 200. It is a forced 429/503, timeout, duplicate request, partial side effect, and exhausted-budget fixture through the same route card and receipt path.
If the trace cannot explain why the agent stopped retrying, who owned the budget, whether the call was safe to replay, and which recovery action is allowed next, the route is not ready for unattended repeat use.

Operator rule

A clean stop is part of the product.

Agents do not get credit for eventually succeeding if the route spent the user's quota blindly first. The receipt has to show why the system retried, why it stopped, and what safe recovery remains.

The production checklist

1. Route-level retry budget

Set max attempts, max elapsed time, max queued delay, max tokens or dollars, and max provider calls per route. Do not inherit a global retry policy blindly across tools with different side effects.

2. Quota owner and lane

Name the budget owner: user, tenant, workspace, Rhumb-managed lane, customer key, provider account, or explicit test quota. The receipt should show which lane was charged or protected.

3. Idempotency and side-effect class

Separate read, search, estimate, create, update, send, delete, purchase, and external-message calls. Only replay when the route has an idempotency key or a verified no-side-effect class.

4. Backoff and jitter evidence

Record the Retry-After header, provider reset time, chosen delay, jitter range, queue position, and whether the model is allowed to ask for a manual recovery step instead of hammering the provider.

5. Duplicate and partial-result fixture

Force a timeout after provider acceptance, duplicate the same request id, and verify the second call resolves to the original receipt or a typed duplicate denial instead of repeating the side effect.

6. Exhaustion denial

When the budget is spent, return a typed denial with attempts, elapsed time, quota owner, protected provider, next retry window, and safe recovery path. Do not let the model improvise another route around the budget.

Failure fixtures

Do not promote a route until the bad timing cases have receipts.

Provider 429

Expected: Respect Retry-After or reset metadata, stop at route budget, and receipt the protected quota owner.

Provider 503 / network timeout

Expected: Retry only idempotent or explicitly replay-safe classes; include backoff decision, elapsed ceiling, and final recovery hint.

Timeout after accepted write

Expected: Use idempotency key or status lookup before replay. A second side effect is a failed gate, even if the final response is 200.

Agent loop repeats same ask

Expected: Collapse duplicate intent into one receipt or deny after budget exhaustion; do not multiply provider calls because the planner rephrased the task.

Fallback provider route

Expected: Require a separate budget owner, data-use rule, credential lane, and receipt. Fallback is not a hidden retry path.

Trace evidence

The retry receipt should make the loop boring to audit.

Rate-limit and timeout handling only become operator-grade when every attempt is reconstructable. The evidence should identify the protected budget, the replay decision, the provider response, and the recovery path without depending on the model's explanation after the fact.

The receipt should include:

route id and tool call id
caller / tenant / workspace
operation class and side-effect class
quota owner and credential lane
provider account / capability id
attempt number and max attempts
elapsed time and wall-clock ceiling
token, dollar, and provider-call budget
provider status, Retry-After, and reset metadata
idempotency key or replay decision
backoff delay and jitter range
duplicate / partial-result check
policy decision and denial code
receipt id and allowed recovery action

Copy-paste route card

Budget the route before the loop runs.

MCP route:
Caller / tenant:
Operation and side-effect class:
Quota owner:
Credential lane:
Max attempts:
Max elapsed time:
Token / dollar / provider-call cap:
Retry-after / backoff rule:
Idempotency key or replay guard:
Forbidden fallback routes:
Exhaustion denial code:
Receipt fields:

Common misreads

Retry systems usually fail in predictable ways:

Treating a 429 as a temporary nuisance instead of a budget decision that protects a user, tenant, provider account, or managed lane.
Using one global retry middleware for read-only search, email sends, calendar writes, purchases, and payment calls.
Logging final success while losing the evidence that three provider calls, a fallback, and a timeout happened first.
Letting the model route around a rate limit through a second provider without a separate budget and data-use decision.
Calling a tool idempotent because the endpoint name says create_or_update while the provider does not accept a stable idempotency key.
Counting token budget but not provider-call budget, even though the provider quota is the scarce production resource.

Related Rhumb guides

If you want the owned version with the route-hardening CTA, it is here: https://rhumb.dev/blog/mcp-retry-rate-limit-budget-checklist

DEV Community