Pedro Santos

Posted on May 20

Testing and Debugging MCP

#java #ai #mcp #testing

Testing and Debugging MCP: The Curl-First Approach

In the previous post, I connected an AI agent to 4 MCP servers with 12+ tools. It works. Until it doesn't.

When an agent gives a wrong answer, the question is always the same: is it the LLM, the prompt, or the tool? This post covers how I debug MCP integrations, starting with the simplest approach that catches 90% of issues.

Test the Tool Before You Test the Agent

The biggest mistake I made early on was debugging the agent when the problem was in the tool. I'd tweak the system prompt for hours, then realize the MCP server was returning malformed data.

Now I always test tools with curl first. MCP is just HTTP, so you can call any tool without an AI agent or LangChain4j.

Step 1: Open an SSE Session

curl -N http://localhost:8092/sse

This opens a Server-Sent Events connection. The server returns a sessionId in the first event. Copy it. You'll need it for every subsequent request.

Step 2: Initialize the Connection

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "clientInfo": { "name": "debug-client", "version": "1.0.0" },
      "capabilities": {}
    }
  }'

Step 3: List Available Tools

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }'

This returns all registered tools with their descriptions and schemas. I check three things here: are all expected tools listed? Are the descriptions accurate? Are the parameter schemas correct?

I once spent an hour debugging a failing agent. The tools/list response showed 2 tools instead of 3. I had forgotten to register getFraudRiskScore in the .tools(...) call. The LLM was trying to use a tool that didn't exist.

Step 4: Call a Tool Directly

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "getStockByProduct",
      "arguments": { "productCode": "COMIC_BOOKS" }
    }
  }'

This runs the exact same code path the agent would trigger. If the response is wrong here, the problem is in your service code, not the LLM.

Common MCP Bugs (and How to Find Them)

1. Tool Name Mismatch

The system prompt says getTransactionStatus. The MCP server exposes getPaymentStatus. The LLM tries to call a tool that doesn't exist. The agent gives a vague answer with no real data.

How to catch it: Run tools/list and compare every tool name against your system prompt. They must match exactly.

2. Missing Required Parameters

The tool schema says transactionId is required. The LLM calls it with only orderId. The tool fails with a null pointer or returns "not found."

How to catch it: Look at the schema in tools/list. If a parameter is required, the description should make clear where to get it. I add notes like "extract from the event's transactionId field" in the tool description.

3. Wrong Parameter Types

The schema says threshold is an integer. The LLM sends it as a string "3". The handler casts (Integer) args.get("threshold") and throws a ClassCastException.

How to catch it: Test with curl using both correct and incorrect types. The MCP SDK does some type coercion, but not all cases. I add explicit type checks in the handler for critical tools.

4. Overly Broad Tool Descriptions

The description says "gets data." The LLM calls it for everything, even when another tool would be better.

How to catch it: Ask the agent a question that should use a different tool. If it picks the wrong one, the description is too vague. I rewrite descriptions to include "use this when..." and "do not use for..." clauses.

Enabling Request/Response Logging

LangChain4j's MCP transport supports logging. I enable it during development:

private McpClient buildClient(String sseUrl) {
    return new DefaultMcpClient.Builder()
        .transport(new HttpMcpTransport.Builder()
            .sseUrl(sseUrl)
            .logResponses(true)
            .logRequests(true)
            .build())
        .build();
}

This prints every JSON-RPC request and response to the console. You see exactly which tools the agent calls, with which arguments, and what comes back. Noisy in production, invaluable during development.

On the server side, the same chat model logging works:

GoogleAiGeminiChatModel.builder()
    .logRequests(true)
    .logResponses(true)
    .build();

This shows the functionDeclaration sent to Gemini and the functionCall it generates. If the LLM is picking the wrong tool, you can see its reasoning in the request/response log.

Testing Tool Responses

The format of the tool response matters for the LLM. I switched from key=value strings to JSON early on.

Before:

status=SUCCESS | totalAmount=150.00 | totalItems=3

After:

return jsonUtil.toJson(paymentService.findByTransactionId(txId))
    .orElse("No payment found");

ObjectMapper.writeValueAsString() produces clean JSON that the LLM parses reliably. The key=value format caused parsing errors in about 10% of cases, where the LLM would treat the pipe character as part of the value.

The Debugging Checklist

When an agent gives a wrong or empty answer, I run through this checklist:

Test the tool with curl. Does it return the expected data?
Check tools/list. Are all tools registered? Do names match the system prompt?
Check tool descriptions. Are they specific enough for the LLM to pick the right one?
Check parameter schemas. Do required fields match what the LLM can extract from context?
Check maxSequentialToolsInvocations. Is it high enough for the workflow? A 5-saga analysis needs at least 11 tool calls.
Check maxOutputTokens. Is the response being truncated?
Enable logging. Look at the actual functionCall the LLM generates. Wrong tool? Wrong params?

Most bugs fall into categories 1-3. The tool itself is broken, the tool name doesn't match, or the description is wrong.

What I'd Add Next

I don't have automated integration tests for MCP yet. Each tool is tested in isolation via the service's unit tests. But the full chain (agent → MCP client → HTTP → MCP server → database) is only tested manually.

A proper integration test would: start the service with Testcontainers, connect an MCP client, call each tool, and verify the response. That's on the roadmap.

For now, the curl approach catches most issues and takes 30 seconds per tool.

The repo: github.com/pedrop3/saga-orchestration