MCP CI gates need to distinguish two very different failures:
- the server is actually broken
- the downstream dependency is temporarily flaky
If both become hard failures, CI gets noisy.
If both are ignored, the gate stops meaning anything.
So I shipped @k08200/mcp-probe@1.12.0 with explicit sidecar retry policy for tool-call dry-runs.
The problem
A readiness gate that calls real MCP tools can hit transient downstream failures:
503 Service Unavailable502 Bad Gateway504 Gateway Timeout- rate limits
- short network timeouts
But auth and permission failures are different. A 401 or 403 usually means the agent will fail in production too.
Those should stay visible unless the contract explicitly says otherwise.
Retry is opt-in per tool
mcp-probe now lets a sidecar contract define retry behavior per tool:
{
"tools": {
"logs_query": {
"input": {
"query": "service:web status:error",
"timeframe": "1h"
},
"retry": {
"attempts": 3,
"delayMs": 1000,
"retryOn": [429, 500, 502, 503, 504, "timeout", "rate limit"]
},
"expect": {
"status": "pass"
}
}
}
}
The important part: retry is not global magic.
It only happens when the sidecar explicitly opts in.
Receipts still show the flake
If a call fails once and passes on retry, the final result can pass, but the receipt still records every attempt.
That means CI can tolerate a transient downstream blip without pretending the run was clean.
Example shape:
{
"tool": "flaky_read",
"status": "pass",
"source": "sidecar",
"attempts": [
{
"attempt": 1,
"status": "fail",
"error": "503 Service Unavailable: transient downstream"
},
{
"attempt": 2,
"status": "pass"
}
]
}
That is the distinction I want MCP CI gates to preserve:
- hard failures should block
- transient failures can be retried
- pass-after-retry should still leave a receipt
Install
npm install -D @k08200/mcp-probe
Or run directly:
npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary --receipt-file mcp-probe.receipt.json
GitHub release: https://github.com/k08200/mcp-probe/releases/tag/v1.12.0
Top comments (1)
The distinction between "the server is broken" and "the downstream is flaky" is exactly what I wish more CI gates would make explicit. We've been dealing with a similar pattern in our test automation — a flaky API call that 503s once in a blue moon, and the whole pipeline turns red. Our fix was a retry wrapper too, but the receipt approach is cleaner: you still see the transient in the logs, it just doesn't block the gate.
Do you have plans to surface those receipts in a dashboard or PR comment summary? That'd be the missing piece for teams that need to track flake trends over time.