How to test MCP servers in TypeScript before they break in production

#ai #mcp #webdev #typescript

Your MCP server works on your laptop. The tool calls return the right shapes, the client connects cleanly, the session behaves. Then you deploy it and a client reconnects after a network hiccup and the session state is gone. Or you scale to two instances and half the requests fail because session IDs resolve to the wrong process. Or someone sends two concurrent requests and the tool handler corrupts shared state.

Testing catches these before your users do. This is a testing playbook for TypeScript MCP servers built on the official SDK.

The demo-to-production gap for MCP servers

The official TypeScript SDK makes it easy to get something working. A few tool registrations, an McpServer instance, a transport, and you are serving. The problem is that "working" in the demo sense and "working" in the production sense are different things.

A demo tests one happy path. Production tests edge cases that emerge from real clients: reconnects, concurrent tool calls, malformed inputs, slow downstream APIs, and the transport contract itself. None of those show up in a single manual run against your local instance.

The gap is not a criticism of the SDK. It is a consequence of how easy the SDK makes it to build a server without thinking about what breaks it. A test suite closes the gap before you ship.

What actually breaks: transport, sessions, tool contracts

Three categories fail most often.

Transport behavior. The SDK added Streamable HTTP support in version 1.10.0. Under this transport, the server exposes a single HTTP endpoint that handles both POST and GET. Clients use POST for tool calls and GET to open a streaming connection via server sent events. Tests that only exercise stdio miss this entirely.

Session state. The StreamableHTTPServerTransport is stateful per session. If you store anything in process memory keyed by session ID, a restart or a second instance will drop it. Tests that do not simulate reconnects miss this failure mode.

Tool contracts. Each tool you register has an input schema and an expected output shape. Tests that call tools with valid inputs only miss the cases where a client sends a slightly wrong shape or a downstream API returns something unexpected.

Unit-testing tools and resources in isolation

The cleanest place to start is the tool handler itself, before any transport is involved.

Each tool handler is a function that takes validated input and returns a result. Extract the handler logic from the server.tool() registration so you can call it directly in tests.

// tool-handlers.ts
export async function getItemHandler(input: { id: string }) {
  const item = await fetchItem(input.id);
  return { content: [{ type: "text", text: JSON.stringify(item) }] };
}

// getItem.test.ts
import { getItemHandler } from "./tool-handlers";

test("returns item content", async () => {
  const result = await getItemHandler({ id: "abc" });
  expect(result.content[0].type).toBe("text");
});

This pattern makes each handler independently testable without spinning up the full server. Stub the downstream calls with your test framework's mocking utilities. Cover the happy path, malformed input, and downstream failure.

Resource handlers work the same way. Extract, test in isolation, stub dependencies.

Contract assertions against the MCP schema

After unit tests, the next layer checks that your tool registrations conform to the MCP protocol. A contract test instantiates the real server and sends actual protocol requests, then asserts on the response shape.

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";
import { createServer } from "./server";

test("list tools returns registered tools", async () => {
  const server = createServer();
  const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
  await server.connect(serverTransport);

  const client = new Client({ name: "test", version: "1.0.0" }, {});
  await client.connect(clientTransport);

  const result = await client.listTools();
  const toolNames = result.tools.map((t) => t.name);
  expect(toolNames).toContain("get-item");
});

The InMemoryTransport in the SDK is designed exactly for this. It lets you run client and server in the same process without any network, which keeps tests fast and deterministic.

Assert on the full response shape for each tool: input schema, output content types, error response format. This is the layer that catches the gap between what your server claims to support and what it actually returns.

Testing Streamable HTTP behavior

The in memory transport covers the protocol layer. Testing the HTTP transport layer catches a different class of failure: auth middleware, session header handling, and the streaming path.

Stand up a real HTTP server on a random port, run your requests against it, and shut it down after each test.

import { createServer as createHttpServer } from "http";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";

let httpServer: ReturnType<typeof createHttpServer>;
let baseUrl: string;

beforeAll(async () => {
  const transport = new StreamableHTTPServerTransport({
    sessionIdGenerator: () => crypto.randomUUID(),
  });
  const mcpServer = createServer();
  await mcpServer.connect(transport);
  httpServer = createHttpServer((req, res) => transport.handleRequest(req, res));
  await new Promise<void>((r) => httpServer.listen(0, () => r()));
  const addr = httpServer.address() as { port: number };
  baseUrl = `http://localhost:${addr.port}`;
});

afterAll(() => httpServer.close());

test("POST returns a valid MCP response", async () => {
  const res = await fetch(`${baseUrl}/mcp`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ jsonrpc: "2.0", method: "tools/list", params: {}, id: 1 }),
  });
  expect(res.status).toBe(200);
  const json = await res.json();
  expect(json.result.tools).toBeDefined();
});

Add a test that opens GET on the same endpoint and confirms the SSE connection accepts. Add a test that sends an invalid session ID and checks the server handles it without crashing.

A CI setup that catches regressions

A test suite only helps if it runs consistently. For MCP servers, the minimal CI setup is:

Unit tests on every commit (fast, no network)
Contract tests via InMemoryTransport on every commit (still fast)
HTTP transport tests on pull requests and on merge to main

If you deploy across multiple instances, add a test that starts two server processes and verifies that a session created on instance one can be resumed on instance two. This tests your external session store. It is slower, so running it on pull requests is reasonable.

If you are building an AI product that depends on your MCP server, the deployment and observability patterns in Next.js for AI products apply directly to the production layer above the server. For the enterprise session model, see MCP for enterprise agents.

FAQ

What is an MCP server?
An MCP server exposes tools, resources, and prompts to LLM clients via the Model Context Protocol. Clients connect to it and call tools by name, receiving structured results.

How do you test an MCP server?
Start with unit tests on each tool handler in isolation. Add contract tests using InMemoryTransport. Add HTTP transport tests against a local server instance for the full network path.

What transport does MCP use?
MCP supports stdio for local use and Streamable HTTP for networked servers. Streamable HTTP uses a single endpoint that handles POST requests for tool calls and GET requests for SSE streaming. The TypeScript SDK supports Streamable HTTP from version 1.10.0 onward.

Building production AI systems? I consult on agentic AI and AI product engineering at mudassirkhan.me.