I built a replay testing tool for MCP servers — here's why and how it works

#ai #mcp #go #devtools

When your AI agent does something unexpected, where do you look?

For most teams right now: stderr noise, missing logs, or vendor black boxes. The execution path disappears, you have no idea what the agent actually sent to the tool, and there's no way to reproduce the failure in a test.

I kept hitting this wall while building MCP agents, so I built mcpscope — an open source observability and replay testing layer for MCP servers.

The problem

MCP (Model Context Protocol) is becoming the standard way AI agents call external tools. But the tooling around it is still catching up. When something goes wrong in production:

There's no standard trace format for MCP traffic
Tool call failures vanish into stderr with no context
Schema changes on upstream servers break your agent silently
There's no way to reproduce a production failure in a test environment

This is the gap mcpscope fills.

How it works

mcpscope is a transparent proxy. You point it at your MCP server and it intercepts every JSON-RPC message — recording requests, responses, latency, and errors — without changing a single line in your server.

go install github.com/td-02/mcp-observer@latest
mcpscope proxy --server ./your-mcp-server --db traces.db

Open http://localhost:4444 and you have a live dashboard showing every tool call, with P50/P95/P99 latency histograms and error timelines.

For Python servers:

mcpscope proxy -- uv run server.py

For HTTP MCP servers:

mcpscope proxy --transport http --upstream-url http://127.0.0.1:8080

The feature I'm most excited about: replay

This is the part I haven't seen in any other MCP tooling.

Once mcpscope has recorded your production traces, you can export and replay them against your server in CI:

# Export real production traces
mcpscope export --config ./mcpscope.example.json --output traces.json --limit 200

# Replay in CI — fail on errors or latency regressions
mcpscope replay --input traces.json --fail-on-error --max-latency-ms 500 -- uv run server.py

Record in prod. Replay in CI. Catch regressions before they reach your agent.

This unlocks a workflow that wasn't possible before: take a session where your agent behaved unexpectedly, export the exact traces, and turn them into a reproducible test case. No more "it only happens in production."

Schema drift in CI

The other thing that kept biting me: upstream MCP servers changing their tool schemas without warning, silently breaking my agent.

# Capture baseline
mcpscope snapshot --server ./your-mcp-server --output baseline.json
git add baseline.json && git commit -m "chore: add MCP baseline snapshot"

# On every PR:
mcpscope snapshot --server ./your-mcp-server --output current.json
mcpscope diff baseline.json current.json --exit-code

The --exit-code flag makes it CI-friendly — exits non-zero on breaking changes so your PR check fails before the change reaches your agent. There's a GitHub Actions example in the repo.

Everything else in v0.1.0

Live web dashboard — tool call feed, latency percentile views, error timelines
Alerts — Slack, PagerDuty, or any webhook
OpenTelemetry export — plugs into Grafana or Jaeger via OTLP gRPC
SQLite trace store — local by default, Postgres-ready, configurable retention
Workspace + environment scoping — prod vs staging
Docker + Docker Compose included

Why open source and MIT

Tool call data can contain sensitive information. I wanted something that keeps traces local by default, plugs into the stack you already have, and can run in air-gapped environments. No telemetry, MIT licensed.