Shriya Chauhan

Posted on Apr 15

See Every AI Tool Call: MCP Tool Call Metrics in Real Time

#ai #llm #opensource #mcp

Your AI agent just made 47 tool calls. How much did that cost?

If you answered "uh... no idea," you're not alone. Most developers building with the MCP Tools are flying blind when it comes to observability. Your AI client talks to MCP servers, tools get called, tokens get consumed, and your bill quietly climbs, but where is all that spend going?

Enter benchmark-broccoli: a transparent MCP proxy that sits between your AI client and any MCP server, measuring tokens, estimated cost, and latency for every single tool call. Think of it as the missing dashboard for your AI infrastructure, showing you exactly what's happening under the hood, in real time.

What is MCP tool-call monitoring (And Why You Need It)

The Model Context Protocol is Anthropic's standard for connecting AI applications to data sources and tools. Your AI client (Opencode,Claude , Cursor, VS Code) calls MCP servers (filesystem, database, GraphQL, custom tools) to get things done. The catch:out of the box you get almost no signal about those tool round-trips, only the model’s usual chat tokens, not what each tools/call actually cost you in time, tokens, or money.

MCP tool-call monitoring solves this by intercepting the communication between client and server, logging every interaction, and giving you actionable metrics:

Token counts — Input/output tokens per call (using tiktoken o200k_base)
Cost estimates — Per-call and per-session pricing across 12+ AI models
Latency tracking — How long each tool call actually takes
Session grouping — Automatic clustering of related calls by time gap
Schema overhead — The hidden cost of listTools() payloads

Without this layer, you're optimizing in the dark. With it, you know exactly which tools are expensive, which prompts are inefficient, and where to focus your optimization efforts.

How benchmark-broccoli Works

benchmark-broccoli is a stdio MCP proxy written in TypeScript. You configure your AI client to run it instead of your real MCP server command, with the real command after --. Tool results are not rewritten—the proxy forwards work to the upstream server and only observes JSON payloads to count tokens and time. Metrics land in an append-only log: calls.jsonl (one JSON object per line).

The Architecture in 30 Seconds

How benchmark-broccoli Works

benchmark-broccoli is a stdio proxy written in TypeScript. You point your AI client at it instead of directly at your MCP server, and it passes everything through while logging metrics to calls.jsonl.

The Architecture in 30 Seconds

Your AI Client (Cursor, Claude Desktop, etc.)
       ↓
benchmark-broccoli proxy (measures & logs)
       ↓
Any MCP Server (filesystem, database, custom)

The proxy:

Receives MCP requests from your client via stdio
Spawns the real MCP server as a child process
Forwards requests, intercepts responses
Counts tokens, estimates cost, records latency
Appends structured data to calls.jsonl
Streams updates to the live dashboard via Server-Sent Events

The result? A live dashboard that auto-updates as your AI works, showing you every tool call, every token, and every dollar (or fraction thereof).

Key Features That Actually Matter

1. Works With Any MCP Server (Seriously, Any)

Unlike monitoring solutions tied to specific tools, benchmark-broccoli is MCP-native. If it speaks the Model Context Protocol, you can measure it:

Official / community servers (e.g. filesystem, Git, fetch) and bridges like mcp-remote for URL-based MCP
Domain-specific servers you or others maintain (e.g. Postgres, MongoDB, or GraphQL—whatever your MCP server implements)
Custom servers you wrote yourself

No SDK changes. No code modifications. Just proxy it.

2. Per-Query Session Tracking

Your AI doesn't make one tool call, it makes bursts of calls per prompt. benchmark-broccoli automatically groups these into sessions using a configurable time-gap heuristic (default: 30 seconds).

Each session shows:

Total cost across all calls
Schema overhead (attributed once per proxy restart)
Call sequence and timing
User identifier (from MCP_USER env var)

This is game-changing for understanding per-prompt efficiency. Instead of seeing "347 calls today," you see "12 sessions, the first one cost $0.47, the second cost $0.02."

3. Multi-Model Cost Comparison

Not sure whether to use Claude Sonnet or GPT-4o? Switch models in the dashboard and see costs recalculate instantly using real token counts.

Built-in pricing for 12 models:

Claude 4 (Sonnet, Opus)
Claude 3.5 (Sonnet, Haiku)
GPT-4o, GPT-4o-mini
GPT-4.1 (full, mini, nano)
Gemini 2.5 (Pro, Flash)

Example: A session that cost $0.42 on Claude Sonnet would cost $2.10 on Claude Opus or $0.11 on Haiku. Suddenly, model selection becomes data-driven instead of gut-feel.

4. Live Dashboard (Dark Theme, Obviously)

The dashboard at http://127.0.0.1:3000 shows:

Session cards — Collapsible, ordered by recency, with aggregated metrics
Per-tool breakdown — Which tools are getting called, how often, at what cost
Real-time updates — SSE-powered, no refresh needed
Export to CSV — Download call history for deeper analysis

It's built with vanilla JS and a brutally clean dark UI. No framework bloat, just fast, functional observability.

5. JSONL Export for Downstream Analysis

Every call gets appended to calls.jsonl in structured format:

{
  "timestamp": "2026-04-15T18:32:41.123Z",
  "tool": "read_file",
  "inputTokens": 1247,
  "outputTokens": 523,
  "cost": 0.0119,
  "latencyMs": 342,
  "sessionId": "session_abc123",
  "user": "dev-team"
}

Feed this into your data warehouse, plot it in Grafana, or build custom alerts. The data is yours.

Getting Started in 3 Minutes

Step 1: Install

git clone https://github.com/Shriya-Chauhan/benchmark-broccoli.git
cd benchmark-broccoli
npm install

Step 2: Configure Your AI Client

Point your MCP config at the proxy. Example for Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "my-server": {
      "command": "npx",
      "args": [
        "tsx", "/absolute/path/to/benchmark-broccoli/src/index.ts",
        "--",
        "npx", "-y", "mcp-remote", "https://my-mcp-server.example.com/mcp"
      ],
      "env": {
        "COST_MODEL": "claude-sonnet-4-20250514"
      }
    }
  }
}

Everything after -- is your real server command. The proxy sits in front.

Step 3: Launch the Dashboard

npm start
# → [dashboard] http://127.0.0.1:3000

Open it in your browser. Use your AI client normally. Watch the metrics roll in.

Who Is This For?

You should use benchmark-broccoli if you:

Build AI agents or assistants with MCP servers
Want to optimize prompt efficiency based on data
Need to justify model choices with actual cost numbers
Are tired of surprise AI bills

You probably don't need it if:

You make <10 MCP calls per day (not enough signal)
You don't care about costs (lucky you)

Performance & Overhead

Latency impact: ~2-15ms per tool call (token counting + JSONL append)
Memory footprint: <50MB for the proxy process
Dashboard overhead: Negligible — SSE pushes updates, no polling

For 1,000 calls/day, you're looking at ~10-15 seconds of added latency total. The observability is worth it.

Roadmap & Contributing

benchmark-broccoli is Apache 2.0 licensed and actively maintained by Shriya Chauhan.

Current priorities:

Accounts & identity — Sign-up / sign-in, user profiles, and per-user (or per-team) dashboards instead of only MCP_USER in .env
Alerts — Notify when estimated cost or latency crosses thresholds (per session, per tool, or daily cap)
Comparative session analysis — Side-by-side runs for A/B prompt or workflow testing (same tools, different instructions)
Support for streaming tool responses

Want to contribute?

Fork the repo
Create a feature branch (git checkout -b feature/my-change)
Push and open a PR
Run npm test and npm run typecheck

FAQ

Q: Does it support streaming responses?

A: Currently, it measures full request/response pairs. Streaming support is on the roadmap.

Q: What if my MCP server uses custom authentication?

A: The proxy is transparent — it forwards everything. Pass auth credentials via env vars in your MCP config, and the proxy will pass them through.

Q: Can I track multiple MCP servers at once?

A: Yes! Run one proxy instance per server, each writing to a different JSONL file. Point the dashboard at whichever file you want to visualize (or aggregate them externally).

Q: Is the token count 100% accurate?

A: It's ~95-98% accurate for Claude 3+, GPT-4+, and Gemini models using tiktoken. Edge cases (special tokens, legacy encodings) may vary slightly, but it's accurate enough for cost estimation and optimization.

Conclusion: Stop Flying Blind

If you're building with MCP and you don't have observability, you're leaving money and performance on the table. benchmark-broccoli gives you the metrics you need in under 5 minutes of setup.

Start now:

git clone https://github.com/Shriya-Chauhan/benchmark-broccoli.git
cd benchmark-broccoli
npm install && npm start

Your future self (and your finance team) will thank you.

Written with ❤️ for the MCP community. Star on GitHub · Report Issues

DEV Community

See Every AI Tool Call: MCP Tool Call Metrics in Real Time

What is MCP tool-call monitoring (And Why You Need It)

How benchmark-broccoli Works

The Architecture in 30 Seconds

How benchmark-broccoli Works

The Architecture in 30 Seconds

Key Features That Actually Matter

1. Works With Any MCP Server (Seriously, Any)

2. Per-Query Session Tracking

3. Multi-Model Cost Comparison

4. Live Dashboard (Dark Theme, Obviously)

5. JSONL Export for Downstream Analysis

Getting Started in 3 Minutes

Step 1: Install

Step 2: Configure Your AI Client

Step 3: Launch the Dashboard

Who Is This For?

Performance & Overhead

Roadmap & Contributing

FAQ

Conclusion: Stop Flying Blind

Top comments (0)