yongrean

Posted on May 23

mcp-probe v1.4.0: Contract assertions for production MCP servers

#devops #opensource #typescript #ai

MCP servers are starting to look like infrastructure.

That means the old readiness question is no longer enough:

Does the process start?

Even this is not enough:

Does tools/list return a clean schema?

A server can pass both checks and still fail every real agent loop because auth handoff, scopes, downstream permissions, environment setup, or data boundaries are broken.

So I shipped mcp-probe v1.4.0 with contract assertions for production MCP servers.

GitHub: https://github.com/k08200/mcp-probe

npm: https://www.npmjs.com/package/@k08200/mcp-probe

The problem: discovery is not readiness

A typical MCP smoke test looks like this:

Start the server
Run initialize
Run tools/list
Check that schemas exist

That catches broken startup and malformed tools.

But it misses the failures that matter in production:

The tool advertises correctly, but every call returns 401
OAuth requires a browser redirect the agent cannot trigger
The DB role is not actually read-only
Write attempts leak raw SQL errors or stack traces
Results omit metadata agents need to reason safely
Tenant or project scope is not preserved
Broad exports or admin actions are reachable
Error codes are unstable, so agents cannot recover

In other words: the server starts, but the contract is broken.

v1.4.0: sidecar contract assertions

mcp-probe already supported sidecar inputs via .mcp-probe.json so teams could run real tools/call checks instead of relying on schema-minimum dummy inputs.

v1.4.0 extends that sidecar with assertions.

Example for a database-backed MCP server:

{
  "tools": {
    "execute_sql": {
      "input": {
        "project_id": "YOUR_PROJECT_ID",
        "query": "select 1 as health_check"
      },
      "expect": {
        "status": "pass",
        "requiredFields": ["rowCount", "limit", "source", "freshness"],
        "maxRows": 100
      }
    },
    "execute_sql_write_denied": {
      "input": {
        "project_id": "YOUR_PROJECT_ID",
        "query": "delete from users where id = 1"
      },
      "expect": {
        "status": "fail",
        "errorCode": "WRITE_NOT_ALLOWED",
        "notContains": ["DATABASE_URL", "password", "stack"]
      }
    }
  }
}

Now CI can validate the contract an agent actually depends on.

What assertions are supported?

`expect.status`

Declare whether a call should pass, fail, or warn.

This is important for negative probes. A write attempt against a read-only DB role should fail. In that case, failure is success.

{
  "expect": {
    "status": "fail"
  }
}

`expect.requiredFields`

Validate that result metadata exists.

For database tools, an agent often needs more than rows. It needs context:

rowCount
limit
source
freshness

{
  "expect": {
    "requiredFields": ["rowCount", "limit", "source", "freshness"]
  }
}

`expect.maxRows`

Catch broad exports or missing limits.

{
  "expect": {
    "maxRows": 100
  }
}

mcp-probe looks for common result shapes such as rowCount, rowsReturned, rows, data, items, and records.

`expect.errorCode`

Require stable structured error codes.

{
  "expect": {
    "status": "fail",
    "errorCode": "WRITE_NOT_ALLOWED"
  }
}

This matters because agents can only recover if errors are predictable.

`expect.contains` and `expect.notContains`

Check for expected output and leaked internals.

{
  "expect": {
    "notContains": ["DATABASE_URL", "password", "stack"]
  }
}

This catches errors that expose raw internals.

`expect.not_error_code`

Treat known auth/permission status codes as warnings instead of hard failures.

{
  "expect": {
    "not_error_code": [401, 403]
  }
}

This keeps OAuth handoff failures visible without confusing them with transport or runtime crashes.

Output example

When assertions pass:

Tool Call Dry-run
  ✓ db_query [sidecar] 1ms
    ✓ status: Tool status matched expected pass
    ✓ requiredFields.rowCount: Found required field "rowCount"
    ✓ requiredFields.limit: Found required field "limit"
    ✓ requiredFields.source: Found required field "source"
    ✓ requiredFields.freshness: Found required field "freshness"
    ✓ maxRows: Row count 1 is within maxRows 100

  ✓ db_write [sidecar] 0ms
    ✓ status: Tool status matched expected fail
    ✓ errorCode: Found expected error code WRITE_NOT_ALLOWED
    ✓ notContains.DATABASE_URL: Output does not contain "DATABASE_URL"
    ✓ notContains.password: Output does not contain "password"
    ✓ notContains.stack: Output does not contain "stack"

If a contract assertion fails, mcp-probe reports:

CONTRACT_ASSERTION_FAILED

and includes per-assertion details in terminal output, JSON output, and GitHub Actions summaries.

Quick start

npx @k08200/mcp-probe@latest init \
  --target @your-org/your-mcp-server \
  --discover \
  --github-actions

Then edit .mcp-probe.json with real read-only probes and run:

npx @k08200/mcp-probe@latest --config mcp-probe.config.json --github-summary

Why this matters

MCP CI should test the contract an agent will actually depend on, not just whether the server process starts.

For database-backed MCP servers, that means validating things like:

read-only role behavior
denied writes
stable error codes
row limits
tenant or project scope
result metadata
no leaked internals

mcp-probe should not know every server's semantics. But it can give teams a small, declarative way to encode the production contract their agents rely on.

That is the goal of v1.4.0.

Release: https://github.com/k08200/mcp-probe/releases/tag/v1.4.0

npm: https://www.npmjs.com/package/@k08200/mcp-probe

DEV Community

mcp-probe v1.4.0: Contract assertions for production MCP servers

The problem: discovery is not readiness

v1.4.0: sidecar contract assertions

What assertions are supported?

`expect.status`

`expect.requiredFields`

`expect.maxRows`

`expect.errorCode`

`expect.contains` and `expect.notContains`

`expect.not_error_code`

Output example

Quick start

Why this matters

Top comments (0)

The problem: discovery is not readiness

v1.4.0: sidecar contract assertions

What assertions are supported?

expect.status

expect.requiredFields

expect.maxRows

expect.errorCode

expect.contains and expect.notContains

expect.not_error_code

Output example

Quick start

Why this matters

`expect.status`

`expect.requiredFields`

`expect.maxRows`

`expect.errorCode`

`expect.contains` and `expect.notContains`

`expect.not_error_code`