Matt Anderson

Posted on Mar 11

Testing Your MCP Server Like You Mean It

#mcp #testing #ai #dotnet

The final part of the ZeroMcp series. Part one covered exposing your ASP.NET Core API as an MCP server. Part two covered everything that's grown since. Part three covered ZeroMcp.Relay for any OpenAPI-backed API. This one closes the loop on the question you should be asking by now: how do I test all of this?

I should probably have led with this from the start: by trade, I'm a test automation architect who specialises in tooling. That's not incidental context — it's why ZeroMcp exists the way it does, a way to provide consistent endpoints without the "human error" factor.

When you spend a career building test infrastructure, a particular instinct gets hard-wired: you don't consider something done until you can verify it works and detect when it stops working. Every tool in the ZeroMcp ecosystem was built with that instinct operating in the background. The [Mcp] attribute runs your real pipeline so your real tests still pass. The Tool Inspector UI lets you execute tools and inspect results. Relay validates specs before it starts serving tools. But none of that answered the question that actually mattered to me: how do you write a test that proves your MCP server behaves correctly?

There's a version of MCP development where you spin up your server, open Claude Desktop, and manually chat at it to see if the tools work. That gets you pretty far early on. It doesn't get you a CI pipeline, regression protection, or any confidence that a schema change didn't silently break tool outputs.

TestKit is where the circle closes: ZeroMcp.TestKit.

The Architecture: Two Repos, One Design Decision

TestKit is split into two repositories, and that split is intentional and worth understanding before anything else.

ZeroMcp.TeskKitEngine is a standalone Rust binary called mcptest. It accepts JSON test definitions, connects to any MCP server, runs the tests, and produces structured JSON results. It knows nothing about .NET, xUnit, or any specific language. It validates MCP protocol correctness, JSON Schema compliance, determinism, error paths, and tool metadata — for any MCP server, regardless of what it's built with.

ZeroMcp.TestKit.dotnet is a fluent C# DSL that wraps the engine. You write tests in idiomatic .NET, the DSL serializes them to JSON, shells out to mcptest, and parses the results back. xUnit integration ([McpFact], [McpTheory], McpAssert) makes tests show up in Visual Studio Test Explorer just like any other test project.

The implication: mcptest is the correctness oracle. The .NET DSL is a convenient way to talk to it. If you're building MCP servers in Python or Go, the engine works for you too — the Rust binary is the portable part.

Your xUnit test project
        │
        ▼
ZeroMcp.TestKit (.NET DSL)
        │  serializes to JSON, shells out
        ▼
mcptest (Rust binary)
        │  speaks MCP protocol
        ▼
Your MCP server (any language, any transport)

Quick Start

using ZeroMcp.TestKit;

await McpTest
    .Server("http://localhost:8000/mcp")
    .Tool("search")
        .WithParams(new { query = "hello" })
        .ExpectSchemaMatch()
        .ExpectDeterministic()
    .RunAsync();

That's a complete test. It connects to your MCP server, calls the search tool with { "query": "hello" }, validates that the response conforms to the tool's declared JSON Schema, and asserts that calling it again produces the same result. If either check fails, RunAsync() throws McpTestException with the details.

Install:

Add ZeroMcp.TestKit and ZeroMcp.TestKit.Xunit via Nuget

The mcptest binary is resolved automatically — first from MCPTEST_PATH, then from the bin directory, then from NuGet native assets (runtimes/{rid}/native/mcptest), then from your system PATH. Most of the time you don't need to think about it.

The Fluent API

The builder is designed to read like a spec of what you expect:

await McpTest
    .Server("http://localhost:8000/mcp")
    .WithTimeout(TimeSpan.FromSeconds(30))
    .WithDeterminismRuns(5)
    .ValidateProtocol()
    .ValidateMetadata()
    .WithAutoErrorTests()
    .Tool("search")
        .WithParams(new { query = "hello" })
        .ExpectSchemaMatch()
        .ExpectDeterministic()
        .WithIgnorePaths("$.result.timestamp")
    .Tool("create_order")
        .WithParams(new { customerName = "Alice", product = "Widget", quantity = 2 })
        .ExpectSchemaMatch()
    .Tool("get_order")
        .WithParams(new { id = 999 })
        .ExpectError()
    .RunAsync();

A few things worth pulling out:

.ValidateProtocol() — checks the MCP handshake, session lifecycle, and JSON-RPC frame structure. This catches implementation bugs that work fine when an LLM is calling your tools but would fail with a strict client.

.ValidateMetadata() — checks that every tool has a name, description, and a valid inputSchema. This is the check that enforces the contract the LLM depends on. Missing descriptions and malformed schemas are silent failures from the LLM's perspective; it just works less well. This makes them loud.

.WithAutoErrorTests() — automatically generates two additional tests: calling an unknown tool name, and calling a real tool with malformed parameters. Both should return proper JSON-RPC error responses. Surprisingly many MCP server implementations fail one or both of these.

.ExpectDeterministic() with .WithIgnorePaths(...) — calls the tool multiple times (default 3, configurable with .WithDeterminismRuns()) and compares results. WithIgnorePaths takes JSONPath expressions for fields that are legitimately non-deterministic, like timestamps or request IDs. Everything else must match.

.ExpectError() / .ExpectErrorCode(long) — for testing your error paths explicitly. If get_order with a non-existent ID should return an error, test that it does.

xUnit Integration

Add ZeroMcp.TestKit.Xunit and your MCP tests sit alongside your existing test suite:

using ZeroMcp.TestKit;
using ZeroMcp.TestKit.Xunit;

public class OrdersToolTests
{
    [McpFact(DisplayName = "get_order returns valid schema")]
    public async Task GetOrderSchemaValid()
    {
        await McpTest
            .Server("http://localhost:5000/mcp")
            .Tool("get_order")
                .WithParams(new { id = 1 })
                .ExpectSchemaMatch()
            .RunAsync();
    }

    [McpFact(DisplayName = "get_order returns error for unknown id")]
    public async Task GetOrderNotFound()
    {
        await McpTest
            .Server("http://localhost:5000/mcp")
            .Tool("get_order")
                .WithParams(new { id = 99999 })
                .ExpectError()
            .RunAsync();
    }

    [McpFact(DisplayName = "create_order is deterministic for same input")]
    public async Task CreateOrderDeterministic()
    {
        await McpTest
            .Server("http://localhost:5000/mcp")
            .WithDeterminismRuns(3)
            .Tool("create_order")
                .WithParams(new { customerName = "Alice", product = "Widget", quantity = 1 })
                .ExpectSchemaMatch()
                .ExpectDeterministic()
                .WithIgnorePaths("$.result.orderId", "$.result.createdAt")
            .RunAsync();
    }
}

[McpFact] is xUnit's [Fact] with DisplayName support tuned for MCP test naming. Tests appear in Test Explorer, run in CI with dotnet test, and fail with clear messages when something breaks.

For cases where you want assertion-style checks rather than throw-on-failure:

var result = await McpTest
    .Server("http://localhost:5000/mcp")
    .Tool("search").WithParams(new { query = "hello" }).ExpectSchemaMatch()
    .RunWithoutThrowAsync();

McpAssert.Passed(result);
McpAssert.ToolPassed(result, "search");
McpAssert.SchemaValid(result, "search");
McpAssert.Deterministic(result, "search");

What the Engine Actually Validates

It's worth being explicit about what mcptest checks, because some of these aren't things you'd think to test manually.

Protocol validation goes beyond "does it respond to tool calls." It checks the initialize / initialized handshake sequence, that JSON-RPC frames have correct id, method, and jsonrpc fields, and that error responses use the right error code structure. An MCP server that works with Claude but fails with a strict client — or a future version of the protocol — will show up here.

Schema validation checks that tool outputs conform to the tool's declared inputSchema. This is the contract your LLM depends on to understand what a tool returns. Schema drift — where the actual response shape diverges from what the tool advertises — is a silent regression. Your LLM starts hallucinating about what fields exist. This test catches that drift at the schema level before it manifests as confusing model behavior.

Determinism validation is about the LLM's ability to reason reliably about tool results. If calling get_order with the same ID returns different shapes on different calls, the model can't build a coherent mental model of what the tool does. This matters most for read-heavy tools that query live data — you exclude the volatile fields with WithIgnorePaths and validate that the structure itself is stable.

Baseline diffing — available via the mcptest diff command — lets you capture a known-good response and compare future runs against it. This is the regression detection story: after a deployment, run mcptest diff against your baselines and get a clear diff of anything that changed.

Recording and Replay

One practical feature for CI: session recording.

mcptest run --file tests.json --server http://localhost:8000/mcp --record session.json

Then replay offline, without a running server:

mcptest run --file tests.json --replay session.json

This matters for a few scenarios. If your MCP server connects to external APIs (via ZeroMcp.Relay, for example), you don't want CI making real calls to Stripe on every push. Record a session against a real server once, commit session.json, replay it in CI. You get the same correctness checks without the external dependency or the cost.

It's also useful for debugging: record a session where something went wrong, then replay it as many times as you need to understand the failure.

Scaffolding Tests

If you're adding TestKit to an existing MCP server, you don't have to write test definitions from scratch:

# Generate stubs for all tools
mcptest generate --scaffold --server http://localhost:8000/mcp --out tests.json

# Generate known-good baselines from real responses
mcptest generate --known-good --server http://localhost:8000/mcp \
  --params search:'{"query":"hello"}' --out baseline.json

--scaffold gives you a test definition with every tool stubbed out and placeholders for params. Fill in real values and you have a starting point. --known-good actually calls the tools and captures the responses as approved baselines — commit those and you have regression detection from day one.

ZeroMcp Integration

If you're using ZeroMcp (the library) for your own ASP.NET Core API, the natural pattern is to start the test host in-process and point TestKit at it:

public class McpToolIntegrationTests : IAsyncLifetime
{
    private WebApplication _app = null!;
    private string _serverUrl = null!;

    public async Task InitializeAsync()
    {
        var builder = WebApplication.CreateBuilder();
        builder.Services.AddControllers();
        builder.Services.AddEndpointsApiExplorer();
        builder.Services.AddZeroMcp(o => { o.ServerName = "Test"; o.ServerVersion = "1.0"; });
        // register your real services here

        _app = builder.Build();
        _app.MapControllers();
        _app.MapZeroMcp();

        var port = GetFreePort();
        _serverUrl = $"http://localhost:{port}";
        await _app.StartAsync();
    }

    [McpFact(DisplayName = "get_order tool schema is valid")]
    public async Task GetOrderSchemaValid()
    {
        await McpTest
            .Server($"{_serverUrl}/mcp")
            .ValidateMetadata()
            .Tool("get_order")
                .WithParams(new { id = 1 })
                .ExpectSchemaMatch()
            .RunAsync();
    }

    public async Task DisposeAsync() => await _app.StopAsync();
}

Your real DI container, your real services, your real pipeline — TestKit drives it through the MCP protocol the same way Claude would. The difference is you control the assertions.

The Thing This Solves

The honest version of MCP development without a test harness looks like this: you make a change, restart your server, open Claude, describe what you want it to do, and see if the tool call works. If it doesn't, you're reading logs and guessing. If it does, you ship. Then something breaks in production because a response shape changed and the model started interpreting a field that no longer exists.

TestKit is the thing that makes MCP tools first-class citizens of your existing software engineering practices. Schema validation catches drift before deployment. Determinism checks catch flakiness before the LLM notices it. Protocol validation catches spec violations before a client upgrade exposes them. Baseline diffing catches regressions in CI before they reach users.

None of this is exotic. It's the same category of testing you already do for your HTTP endpoints — just expressed in terms of what an MCP client expects rather than what an HTTP client expects.

And honestly, for me, it's also the inevitable destination. You don't spend a career building test infrastructure without developing a strong opinion that tooling without a test harness is a prototype, not a product. Every design decision across ZeroMcp — running the real pipeline, forwarding real headers, enforcing real auth — was made so that the things you'd want to test would be worth testing. TestKit is what makes that true. It's not the last piece bolted on; it's the piece the rest of the ecosystem was always pointing toward.

Get Started

Engine: github.com/ZeroMcp/ZeroMcp.TeskKitEngine
NuGet (core): dotnet add package ZeroMcp.TestKit
NuGet (xUnit): dotnet add package ZeroMcp.TestKit.Xunit
.NET DSL: github.com/ZeroMcp/ZeroMcp.TestKit.dotnet

Both repos are early, and I am looking at putting together DSL packages for Python, Rust and Node. Issues and PRs welcome — the MCP testing ecosystem for .NET is a blank slate and there's a lot of ground to cover.

Drop me a comment below if you do try it, and let me know how you get on

Top comments (2)

Algis • Mar 11

The 'schema drift as silent regression' framing nails the core problem, but test coverage only reaches servers you own. Third-party MCP servers you install can push schema updates between your test cycles and your production agent encounters tool signatures that never touched your harness. A quarantine layer at connection time - blocking until the schema diff gets reviewed - catches this production-side variant of the same regression class. ZeroMcp.TestKit for dev/CI and a connection-time quarantine for runtime: two stages, same problem, different instances.

Matt Anderson • Mar 11 • Edited

Funnily enough, I have a tool that does that, but from an external pov, however I could easily bolt pieces together to form something that fills this requirement.. Acting as a middleman.. Give me a few days 😀