DEV Community

Tapesh Chandra Das
Tapesh Chandra Das

Posted on

I built a replay testing tool for MCP servers — here's why and how it works

When your AI agent does something unexpected, where do you look?

For most teams right now: stderr noise, missing logs, or vendor black boxes. The execution path disappears, you have no idea what the agent actually sent to the tool, and there's no way to reproduce the failure in a test.

I kept hitting this wall while building MCP agents, so I built mcpscope — an open source observability and replay testing layer for MCP servers.

The problem

MCP (Model Context Protocol) is becoming the standard way AI agents call external tools. But the tooling around it is still catching up. When something goes wrong in production:

  • There's no standard trace format for MCP traffic
  • Tool call failures vanish into stderr with no context
  • Schema changes on upstream servers break your agent silently
  • There's no way to reproduce a production failure in a test environment

This is the gap mcpscope fills.

How it works

mcpscope is a transparent proxy. You point it at your MCP server and it intercepts every JSON-RPC message — recording requests, responses, latency, and errors — without changing a single line in your server.

go install github.com/td-02/mcp-observer@latest
mcpscope proxy --server ./your-mcp-server --db traces.db
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:4444 and you have a live dashboard showing every tool call, with P50/P95/P99 latency histograms and error timelines.

For Python servers:

mcpscope proxy -- uv run server.py
Enter fullscreen mode Exit fullscreen mode

For HTTP MCP servers:

mcpscope proxy --transport http --upstream-url http://127.0.0.1:8080
Enter fullscreen mode Exit fullscreen mode

The feature I'm most excited about: replay

This is the part I haven't seen in any other MCP tooling.

Once mcpscope has recorded your production traces, you can export and replay them against your server in CI:

# Export real production traces
mcpscope export --config ./mcpscope.example.json --output traces.json --limit 200

# Replay in CI — fail on errors or latency regressions
mcpscope replay --input traces.json --fail-on-error --max-latency-ms 500 -- uv run server.py
Enter fullscreen mode Exit fullscreen mode

Record in prod. Replay in CI. Catch regressions before they reach your agent.

This unlocks a workflow that wasn't possible before: take a session where your agent behaved unexpectedly, export the exact traces, and turn them into a reproducible test case. No more "it only happens in production."

Schema drift in CI

The other thing that kept biting me: upstream MCP servers changing their tool schemas without warning, silently breaking my agent.

# Capture baseline
mcpscope snapshot --server ./your-mcp-server --output baseline.json
git add baseline.json && git commit -m "chore: add MCP baseline snapshot"

# On every PR:
mcpscope snapshot --server ./your-mcp-server --output current.json
mcpscope diff baseline.json current.json --exit-code
Enter fullscreen mode Exit fullscreen mode

The --exit-code flag makes it CI-friendly — exits non-zero on breaking changes so your PR check fails before the change reaches your agent. There's a GitHub Actions example in the repo.

Everything else in v0.1.0

  • Live web dashboard — tool call feed, latency percentile views, error timelines
  • Alerts — Slack, PagerDuty, or any webhook
  • OpenTelemetry export — plugs into Grafana or Jaeger via OTLP gRPC
  • SQLite trace store — local by default, Postgres-ready, configurable retention
  • Workspace + environment scoping — prod vs staging
  • Docker + Docker Compose included

Why open source and MIT

Tool call data can contain sensitive information. I wanted something that keeps traces local by default, plugs into the stack you already have, and can run in air-gapped environments. No telemetry, MIT licensed.

What's next

Per-team budget enforcement, audit log export (CSV and JSON), and a hosted cloud version are on the roadmap.

But right now I'm most interested in hearing from people building MCP agents — what are you running into that mcpscope doesn't solve yet?


Repo: https://github.com/td-02/mcp-observer

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

In our experience with enterprise teams, a surprising challenge is not just debugging AI agents, but translating their outputs into actionable insights for stakeholders. When an AI agent behaves unexpectedly, teams often overlook the context in which data was processed. Using tools like replay testing can help visualize the agent's decision-making process, offering a clearer understanding of why certain actions were taken. This approach bridges the gap between technical outputs and business needs, ensuring AI systems are aligned with organizational goals. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)