Testing and Debugging MCP: The Curl-First Approach
In the previous post, I connected an AI agent to 4 MCP servers with 12+ tools. It works. Until it doesn't.
When an agent gives a wrong answer, the question is always the same: is it the LLM, the prompt, or the tool? This post covers how I debug MCP integrations, starting with the simplest approach that catches 90% of issues.
Test the Tool Before You Test the Agent
The biggest mistake I made early on was debugging the agent when the problem was in the tool. I'd tweak the system prompt for hours, then realize the MCP server was returning malformed data.
Now I always test tools with curl first. MCP is just HTTP, so you can call any tool without an AI agent or LangChain4j.
Step 1: Open an SSE Session
curl -N http://localhost:8092/sse
This opens a Server-Sent Events connection. The server returns a sessionId in the first event. Copy it. You'll need it for every subsequent request.
Step 2: Initialize the Connection
curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2024-11-05",
"clientInfo": { "name": "debug-client", "version": "1.0.0" },
"capabilities": {}
}
}'
Step 3: List Available Tools
curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
-H "Content-Type: application/json" \
-d '{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }'
This returns all registered tools with their descriptions and schemas. I check three things here: are all expected tools listed? Are the descriptions accurate? Are the parameter schemas correct?
I once spent an hour debugging a failing agent. The tools/list response showed 2 tools instead of 3. I had forgotten to register getFraudRiskScore in the .tools(...) call. The LLM was trying to use a tool that didn't exist.
Step 4: Call a Tool Directly
curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "getStockByProduct",
"arguments": { "productCode": "COMIC_BOOKS" }
}
}'
This runs the exact same code path the agent would trigger. If the response is wrong here, the problem is in your service code, not the LLM.
Common MCP Bugs (and How to Find Them)
1. Tool Name Mismatch
The system prompt says getTransactionStatus. The MCP server exposes getPaymentStatus. The LLM tries to call a tool that doesn't exist. The agent gives a vague answer with no real data.
How to catch it: Run tools/list and compare every tool name against your system prompt. They must match exactly.
2. Missing Required Parameters
The tool schema says transactionId is required. The LLM calls it with only orderId. The tool fails with a null pointer or returns "not found."
How to catch it: Look at the schema in tools/list. If a parameter is required, the description should make clear where to get it. I add notes like "extract from the event's transactionId field" in the tool description.
3. Wrong Parameter Types
The schema says threshold is an integer. The LLM sends it as a string "3". The handler casts (Integer) args.get("threshold") and throws a ClassCastException.
How to catch it: Test with curl using both correct and incorrect types. The MCP SDK does some type coercion, but not all cases. I add explicit type checks in the handler for critical tools.
4. Overly Broad Tool Descriptions
The description says "gets data." The LLM calls it for everything, even when another tool would be better.
How to catch it: Ask the agent a question that should use a different tool. If it picks the wrong one, the description is too vague. I rewrite descriptions to include "use this when..." and "do not use for..." clauses.
Enabling Request/Response Logging
LangChain4j's MCP transport supports logging. I enable it during development:
private McpClient buildClient(String sseUrl) {
return new DefaultMcpClient.Builder()
.transport(new HttpMcpTransport.Builder()
.sseUrl(sseUrl)
.logResponses(true)
.logRequests(true)
.build())
.build();
}
This prints every JSON-RPC request and response to the console. You see exactly which tools the agent calls, with which arguments, and what comes back. Noisy in production, invaluable during development.
On the server side, the same chat model logging works:
GoogleAiGeminiChatModel.builder()
.logRequests(true)
.logResponses(true)
.build();
This shows the functionDeclaration sent to Gemini and the functionCall it generates. If the LLM is picking the wrong tool, you can see its reasoning in the request/response log.
Testing Tool Responses
The format of the tool response matters for the LLM. I switched from key=value strings to JSON early on.
Before:
status=SUCCESS | totalAmount=150.00 | totalItems=3
After:
return jsonUtil.toJson(paymentService.findByTransactionId(txId))
.orElse("No payment found");
ObjectMapper.writeValueAsString() produces clean JSON that the LLM parses reliably. The key=value format caused parsing errors in about 10% of cases, where the LLM would treat the pipe character as part of the value.
The Debugging Checklist
When an agent gives a wrong or empty answer, I run through this checklist:
-
Test the tool with
curl. Does it return the expected data? -
Check
tools/list. Are all tools registered? Do names match the system prompt? - Check tool descriptions. Are they specific enough for the LLM to pick the right one?
- Check parameter schemas. Do required fields match what the LLM can extract from context?
-
Check
maxSequentialToolsInvocations. Is it high enough for the workflow? A 5-saga analysis needs at least 11 tool calls. -
Check
maxOutputTokens. Is the response being truncated? -
Enable logging. Look at the actual
functionCallthe LLM generates. Wrong tool? Wrong params?
Most bugs fall into categories 1-3. The tool itself is broken, the tool name doesn't match, or the description is wrong.
What I'd Add Next
I don't have automated integration tests for MCP yet. Each tool is tested in isolation via the service's unit tests. But the full chain (agent → MCP client → HTTP → MCP server → database) is only tested manually.
A proper integration test would: start the service with Testcontainers, connect an MCP client, call each tool, and verify the response. That's on the roadmap.
For now, the curl approach catches most issues and takes 30 seconds per tool.
The repo: github.com/pedrop3/saga-orchestration
Top comments (0)