DEV Community

yongrean
yongrean

Posted on

MCP CI gates need receipts: tools/list is not enough

MCP servers are starting to look like normal infrastructure.

That means they need boring infrastructure checks.

The mistake I kept seeing is this:

"The server starts, and tools/list returns a clean schema. Therefore it works."

That is not enough.

An MCP server can pass initialize, advertise every expected tool, and still fail every real call because auth, scopes, tenant boundaries, environment variables, downstream permissions, or read-only roles are broken.

So I pushed mcp-probe@1.8.0 further toward being a real CI readiness gate for MCP servers.

npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary --fail-on-warn
Enter fullscreen mode Exit fullscreen mode

What changed

1. Warnings can now fail CI

By default, warnings still exit 0. That keeps existing users from getting surprise CI failures.

But production gates often need stricter behavior:

mcp-probe --config mcp-probe.config.json --fail-on-warn
Enter fullscreen mode Exit fullscreen mode

With --fail-on-warn, auth handoff issues, permission warnings, or incomplete readiness receipts can block the workflow.

That matters because many MCP failures are not hard crashes. They are degraded states:

  • OAuth flow requires a browser redirect the agent cannot complete
  • a server starts but every tool call returns 401
  • a database tool works with admin credentials but fails with the intended read-only role
  • the workflow mentions a probe but does not actually run the production boundary check

2. Doctor now checks the actual workflow receipt

mcp-probe doctor already checked whether a GitHub Actions workflow existed.

But that is not enough either.

The new behavior is stricter: the required flags must appear on the same actual mcp-probe run step.

This should pass:

- run: npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary --fail-on-warn
Enter fullscreen mode Exit fullscreen mode

This should not count as a complete gate:

- run: npx @k08200/mcp-probe --config mcp-probe.config.json
- run: npx @k08200/mcp-probe ./server.js --github-summary --fail-on-warn
Enter fullscreen mode Exit fullscreen mode

The flags are present somewhere in the workflow, but no single run step proves the intended config is actually being checked with CI summaries and strict warning handling.

That is the difference between "we have a gate" and "the gate is enforcing the thing we trust."

3. Tool call coverage is now tied to expected tools

For config-based checks, you can declare the expected tool catalog:

{
  "servers": [
    {
      "name": "datadog",
      "target": "https://mcp.example.com/mcp",
      "transport": "http",
      "headers": {
        "Authorization": "Bearer ${DATADOG_MCP_TOKEN}"
      },
      "expectedTools": ["logs_query"],
      "forbiddenTools": ["delete_dashboard", "rotate_api_key"],
      "toolsFile": "./datadog.tools.json"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

If expectedTools and toolsFile are both set, every expected tool needs a sidecar sample input.

That means CI checks not just "is the tool advertised?" but "did we actually provide a meaningful dry-run sample for the tool an agent depends on?"

4. Sidecar inputs are the real contract

Auto-generated inputs are useful for smoke tests, but they mostly hit schema validation.

Real readiness checks need meaningful inputs:

{
  "tools": {
    "logs_query": {
      "input": {
        "query": "service:web status:error",
        "timeframe": "1h"
      },
      "expect": {
        "status": "pass",
        "not_error_code": [401, 403],
        "requiredFields": ["source", "freshness"],
        "maxRows": 100
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

For database-backed MCP servers, these assertions are the interesting part:

  • does the read-only role work?
  • are row limits enforced?
  • are broad exports/admin actions absent or gated?
  • are denied writes structured enough for agents to recover?
  • do results include provenance fields like source and freshness?
  • does the response avoid leaking secrets, stack traces, or raw internals?

Install

npm install -D @k08200/mcp-probe
Enter fullscreen mode Exit fullscreen mode

Or run directly:

npx @k08200/mcp-probe@latest doctor
npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary --fail-on-warn
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/k08200/mcp-probe
npm: https://www.npmjs.com/package/@k08200/mcp-probe

The goal is simple: CI for MCP should test the contract an agent will actually depend on, not just whether the process starts.

Top comments (0)