Most MCP servers ship with exactly one test: a developer typing a prompt into Claude and checking if the output looks right.
That is not testing. That is hoping. And it breaks the moment you change a tool signature, add a parameter, or update your database schema.
Why MCP Servers Are Hard to Test
MCP servers sit between deterministic code and non-deterministic LLMs. Your tools are pure functions — they take inputs, return outputs. But the consumers of those tools are language models that interpret schemas, pick tools based on descriptions, and pass arguments based on inference.
This creates a testing gap. Traditional unit tests cover your business logic. LLM integration tests are slow, expensive, and non-deterministic. The four patterns below close that gap using real MCP protocol interactions — without calling an LLM.
Pattern 1: In-Memory Unit Tests With FastMCP Client
FastMCP 2.x includes a Client class that connects directly to your server in-memory. No subprocess. No network. No LLM. You test your actual server logic through the real MCP protocol — in milliseconds.
Install the testing dependency:
pip install pytest-asyncio
Configure pytest to handle async automatically in pyproject.toml:
[tool.pytest.ini_options]
asyncio_mode = "auto"
Here is a server with two tools:
# server.py
from fastmcp import FastMCP
mcp = FastMCP("WeatherServer")
@mcp.tool()
def get_forecast(city: str, days: int = 3) -> dict:
"""Get weather forecast for a city."""
if days < 1 or days > 14:
raise ValueError(f"Days must be 1-14, got {days}")
return {
"city": city,
"days": days,
"forecast": [{"day": i + 1, "temp_c": 20 + i} for i in range(days)],
}
@mcp.tool()
def get_alerts(region: str) -> list[str]:
"""Get active weather alerts for a region."""
alerts_db = {"northwest": ["Wind advisory until 6 PM"]}
return alerts_db.get(region.lower(), [])
Now write the tests. Create a pytest fixture that wraps your server in a Client:
# test_server.py
import pytest
from fastmcp import Client
from server import mcp # your FastMCP server instance
@pytest.fixture
async def client():
async with Client(transport=mcp) as c:
yield c
async def test_forecast_returns_correct_days(client):
result = await client.call_tool("get_forecast", {"city": "Seattle", "days": 5})
data = result.data
assert data["city"] == "Seattle"
assert len(data["forecast"]) == 5
async def test_forecast_default_days(client):
result = await client.call_tool("get_forecast", {"city": "Portland"})
assert len(result.data["forecast"]) == 3
async def test_forecast_invalid_days_raises(client):
with pytest.raises(Exception):
await client.call_tool("get_forecast", {"city": "Seattle", "days": 30})
async def test_alerts_existing_region(client):
result = await client.call_tool("get_alerts", {"region": "Northwest"})
assert len(result.data) == 1
assert "Wind advisory" in result.data[0]
async def test_alerts_unknown_region(client):
result = await client.call_tool("get_alerts", {"region": "Antarctica"})
assert result.data == []
Run with pytest -v. Every test executes in milliseconds because it uses in-memory transport. No HTTP, no subprocess, no LLM calls.
This pattern catches three categories of bugs immediately: broken tool logic, wrong return types, and missing error handling.
Pattern 2: Schema Validation Tests
Your MCP tools expose JSON schemas to LLMs. If the schema drifts from your implementation — a renamed parameter, a changed type, a missing description — the LLM picks the wrong tool or passes wrong arguments. Schema tests lock this contract down.
async def test_tool_registry_completeness(client):
"""Verify all expected tools are registered."""
tools = await client.list_tools()
tool_names = {t.name for t in tools}
assert tool_names == {"get_forecast", "get_alerts"}
async def test_forecast_schema_has_required_params(client):
"""Verify the forecast tool schema matches expectations."""
tools = await client.list_tools()
forecast = next(t for t in tools if t.name == "get_forecast")
schema = forecast.inputSchema
assert "city" in schema["properties"]
assert schema["properties"]["city"]["type"] == "string"
assert "city" in schema.get("required", [])
async def test_all_tools_have_descriptions(client):
"""LLMs select tools based on descriptions. Missing = broken routing."""
tools = await client.list_tools()
for tool in tools:
assert tool.description, f"Tool '{tool.name}' has no description"
assert len(tool.description) > 10, (
f"Tool '{tool.name}' description too short: '{tool.description}'"
)
Schema tests catch a specific class of failure that unit tests miss entirely: your code works, but the LLM cannot use it. This happens more often than you think. A developer renames a parameter from city to location. The tool still works. The schema updates automatically. But every prompt template, every LLM workflow, and every agent that was trained on the old schema now sends city and gets a validation error.
Schema tests make this failure loud and immediate. When the parameter name changes, test_forecast_schema_has_required_params fails in CI. The developer sees the break before it ships.
If your server exposes resources alongside tools, test those registrations too:
async def test_resources_are_registered(client):
"""Verify static resources are accessible."""
resources = await client.list_resources()
assert len(resources) > 0, "Server exposes no resources"
Run schema tests in CI on every commit. Schema drift is silent and devastating — these tests make it visible.
Pattern 3: Parameterized Edge Case Testing
MCP tools receive arguments from LLMs. LLMs send unexpected inputs — empty strings, extreme values, wrong types. Parameterized tests cover these systematically.
@pytest.mark.parametrize(
"city, days, expected_count",
[
("Seattle", 1, 1),
("Tokyo", 7, 7),
("São Paulo", 14, 14),
("New York", 3, 3),
],
)
async def test_forecast_valid_ranges(client, city, days, expected_count):
result = await client.call_tool("get_forecast", {"city": city, "days": days})
assert len(result.data["forecast"]) == expected_count
@pytest.mark.parametrize(
"invalid_days",
[0, -1, 15, 100],
)
async def test_forecast_rejects_invalid_days(client, invalid_days):
with pytest.raises(Exception):
await client.call_tool(
"get_forecast", {"city": "Seattle", "days": invalid_days}
)
@pytest.mark.parametrize(
"region, has_alerts",
[
("Northwest", True),
("northwest", True),
("NORTHWEST", True),
("southeast", False),
("", False),
],
)
async def test_alerts_case_insensitive(client, region, has_alerts):
result = await client.call_tool("get_alerts", {"region": region})
if has_alerts:
assert len(result.data) > 0
else:
assert len(result.data) == 0
For complex return structures, the inline-snapshot library eliminates manual assertion writing. Install it with pip install inline-snapshot, then write:
from inline_snapshot import snapshot
async def test_forecast_structure(client):
result = await client.call_tool("get_forecast", {"city": "Seattle", "days": 2})
assert result.data == snapshot()
Run pytest --inline-snapshot=fix,create once. The library fills in the expected value automatically from the actual output. On subsequent runs, it asserts against the stored snapshot. When your output changes intentionally, run --inline-snapshot=fix to update.
Pattern 4: Interactive Testing With MCP Inspector
Unit tests verify code paths. But sometimes you need to see what the LLM sees — the exact schemas, the raw responses, the protocol messages. MCP Inspector is the official visual testing tool for this.
Launch it against your server:
# For a Python MCP server
npx @modelcontextprotocol/inspector uv --directory ./my-server run my-server
# For a published PyPI package
npx @modelcontextprotocol/inspector uvx mcp-server-git --repository ~/code/repo.git
This opens a web UI at http://localhost:6274 that connects to your MCP server through a local proxy. From the Inspector, you can:
- Browse all tools — see the JSON schema exactly as an LLM receives it
- Call any tool — fill in parameters through a form, see the raw response
- Inspect resources — view static context your server exposes
- Test prompts — verify prompt templates render correctly
MCP Inspector serves a different purpose than pytest. Use it for:
- Exploratory testing during development — try edge cases manually
- Schema review — verify descriptions are clear enough for LLM tool selection
- Debugging failures — reproduce exact inputs that caused production issues
- Demo and documentation — show stakeholders what your server exposes
The Inspector does not replace automated tests. It complements them. Write pytest tests for regression coverage. Use Inspector for exploration and debugging.
A Practical Test Strategy
Combine all four patterns into a layered strategy:
Layer 1: In-memory unit tests (Pattern 1)
→ Run on every save. Sub-second feedback. Catch logic bugs.
Layer 2: Schema validation tests (Pattern 2)
→ Run in CI on every commit. Catch contract drift.
Layer 3: Parameterized edge cases (Pattern 3)
→ Run in CI. Catch boundary failures and type handling.
Layer 4: MCP Inspector (Pattern 4)
→ Use during development. Manual exploration and debugging.
Your pyproject.toml test configuration:
[tool.pytest.ini_options]
asyncio_mode = "auto"
markers = [
"schema: schema validation tests",
"edge: edge case and boundary tests",
]
Run fast tests during development:
pytest -v -m "not schema and not edge"
Run everything in CI:
pytest -v --tb=short
What This Costs You
Setting up in-memory MCP tests takes about 30 minutes for an existing server. The fixture is 5 lines. Each test is 3-6 lines. You get sub-second feedback on every change.
Compare that to the alternative: a user discovers your tool returns the wrong type, the LLM hallucinates a workaround, and you spend two hours debugging a production trace.
Thirty minutes of test setup prevents hours of production debugging. That trade is worth it every time.
The MCP ecosystem is growing fast. As of March 2026, thousands of MCP servers exist on npm and PyPI, and the number is accelerating. Most of them have zero automated tests. If you ship yours with a proper test suite, you are already ahead of 90% of the ecosystem. More importantly, your users will trust your server because it actually works when they upgrade.
Follow @klement_gunndu for more MCP and AI engineering content. We're building in public.
Top comments (0)