Henry Li

Posted on Apr 19 • Originally published at exesolution.com

MCP Server & Client in Spring AI: Stop Coupling Tools to Your AI Host

#java #ai #llm #springboot

If you've built an LLM feature in Spring Boot, you've probably done something like this: created a @Bean with @Tool-annotated methods, wired it into your ChatClient, and shipped it. That works fine — until your tool set grows, multiple AI applications want to reuse the same tools, or you need to update a tool without redeploying the entire AI service.

That's the problem MCP (Model Context Protocol) solves. This post walks through a two-service setup I built and verified: a standalone MCP Tool Server and an AI Chat Service that discovers tools dynamically over Streamable HTTP — no restart required when tools change.

The full solution with runnable code, Docker Compose, and execution evidence is at exesolution.com. This post covers the core problem and how to get it running locally.

The Problem with In-Process Tool Registration

When you register tools inside the same Spring Boot app that handles LLM interactions, you get:

Deployment coupling — every new tool means a new deployment of the AI service, even though the AI logic didn't change.
No sharing — if three different AI applications need the same "get order status" tool, you copy-paste the implementation into each.
No trust boundary — a bug in a tool method can crash the process that's serving your users.
Static inventory — tools are fixed at startup. Adding one at runtime? Not without a restart.
Zero visibility — tool invocations vanish inside the ChatClient execution loop with no structured logs or traces.

The naive fix is "just put everything in one service." But once you have 20 tools across 5 domains, that service becomes the new monolith.

The Solution: Two Services, One Protocol

The setup has two independently deployable Spring Boot apps:

User
  └─→ AI Chat Service (:8081)
          └─→ ChatClient (Spring AI)
                  └─→ LLM (gpt-4o-mini)
                  └─→ MCP Client
                          └─→ MCP Tool Server (:8080)  ← POST /mcp
                                  └─→ @McpTool-annotated service methods

MCP Tool Server — owns tool implementations. Exposes them via @McpTool annotations over Streamable HTTP. Deployed and versioned independently.

AI Chat Service — user-facing REST API. Knows nothing about specific tools. Uses SyncMcpToolCallbackProvider to auto-discover whatever tools the server exposes, on every request.

The key insight: ToolCallbackProvider re-fetches the tool list from the server on each getToolCallbacks() call. Add a new @McpTool bean, hit the refresh endpoint, and the next conversation picks it up — no restart of either service.

Defining a Tool: One Annotation

On the server side, any Spring bean method can become an MCP tool with @Tool (Spring AI's annotation):

@Service
public class OrderTool {

    @Tool(description = "Get the current status and details of an order by its ID")
    public Map<String, Object> getOrderStatus(
            @ToolParam(description = "The unique order identifier, e.g. ORD-12345")
            String orderId) {

        return orderRepository.findById(orderId)
                .map(order -> Map.of(
                        "orderId",           order.getId(),
                        "status",            order.getStatus(),
                        "estimatedDelivery", order.getEstimatedDelivery().toString(),
                        "items",             order.getItems().size()
                ))
                .orElseThrow(() ->
                        new IllegalArgumentException("Order not found: " + orderId));
    }
}

Spring AI reads the annotation at startup and generates a JSON Schema for the parameters automatically. The LLM receives this schema and knows exactly how to call the tool.

Wiring the Client: One Line

On the AI Host side, wiring all server tools into ChatClient takes one method call:

@Configuration
public class ChatConfig {

    @Bean
    ChatClient chatClient(ChatModel chatModel,
                          SyncMcpToolCallbackProvider toolCallbackProvider) {
        return ChatClient.builder(chatModel)
                .defaultTools(toolCallbackProvider) // ← entire server tool registry
                .build();
    }
}

From here, when a user asks "What's the status of order ORD-12345?", the LLM decides to call getOrderStatus, Spring AI dispatches it over MCP, the tool runs on the server, the result comes back, and the LLM incorporates it into the reply — entirely transparent to the controller layer.

Configuration

MCP Tool Server (application.properties):

spring.ai.mcp.server.name=tool-server
spring.ai.mcp.server.version=1.0.0
spring.ai.mcp.server.protocol=STREAMABLE
server.port=8080

AI Chat Service (application.properties):

spring.ai.mcp.client.toolcallback.enabled=true
spring.ai.mcp.client.connections.tool-server.url=${MCP_SERVER_URL}/mcp
spring.ai.mcp.client.connections.tool-server.transport=STREAMABLE_HTTP
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini
server.port=8081

Dependencies — MCP Server (pom.xml):

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webmvc</artifactId>
</dependency>

Dependencies — AI Host (pom.xml):

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Running It Locally

Prerequisites: Docker Desktop, JDK 17, an OpenAI-compatible API key.

# 1. Clone and configure
cp .env.template .env
# add OPENAI_API_KEY=sk-...

# 2. Start both services
docker compose up -d --build

Verify both services are up:

curl -s http://localhost:8080/actuator/health | jq .status
# → "UP"

curl -s http://localhost:8081/actuator/health | jq .status
# → "UP"

Confirm the tool registry (admin endpoint):

curl -s http://localhost:8080/admin/tools | jq .
# → list of @McpTool-annotated methods with name, description, inputSchema

Trigger a tool call through the chat API:

curl -s -X POST http://localhost:8081/api/chat \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"sessionId":"sess-001","message":"What is the status of order ORD-12345?"}' \
  | jq .
# → {"reply":"Order ORD-12345 is currently SHIPPED...","toolsUsed":["getOrderStatus"]}

Verify the tool call hit the server:

docker compose logs mcp-tool-server | grep "tools/call"
# → log lines showing getOrderStatus invoked with orderId=ORD-12345

Dynamic tool discovery — no restart needed:

# Add a new tool bean to the server, then:
curl -s -X POST http://localhost:8080/admin/tools/refresh \
  -H "Authorization: Bearer <ADMIN_TOKEN>"
# → {"registered":["getOrderStatus","searchProducts",...]}

# Next chat request immediately picks up the new tool
curl -s -X POST http://localhost:8081/api/chat \
  -H "Authorization: Bearer <TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{"sessionId":"sess-001","message":"Search for electronics products"}' \
  | jq .reply
# → uses the newly registered searchProducts tool

What the Stateless Transport Mode Gives You

By default the server runs in stateful STREAMABLE mode (sessions via Mcp-Session-Id headers). For horizontally scaled deployments behind a load balancer, switch to stateless:

# on mcp-tool-server
spring.ai.mcp.server.protocol=STATELESS

In stateless mode the server returns application/json per request. No session affinity required. The same chat requests work identically — the difference is purely at the transport layer.

What's in the Full Solution

This post covers the core problem and the minimal working setup. The complete verified solution at exesolution.com includes:

Full source code for both Spring Boot modules (pom.xml, all Java classes, Docker Compose)
Three @McpTool implementations: OrderTool, ProductTool, and WeatherTool (the last one calls open-meteo.com in real time — verifiable live data)
Security configuration: /mcp endpoint internal-only, /api/chat JWT-protected, /admin/** role-gated
Architecture diagram and request flow diagram
Evidence Pack: 10 verification screenshots from actual execution — health checks, tool registry, chat responses, server-side logs, dynamic refresh

👉 Full solution + runnable code + evidence at exesolution.com

Free registration required to access the code bundle and evidence images.

Key Takeaways

The pattern here — separate MCP server, auto-discovering client — pays off when:

Multiple AI applications need the same tools (deploy once, use everywhere)
Tool implementations need independent scaling or deployment cadence
You want a trust boundary between the LLM execution context and the actual side-effecting code
You're connecting to Claude Desktop, VS Code Copilot, or any other MCP-compatible client — the same server JAR works for all of them without code changes

If you're already using Spring AI for chat and RAG, adding an MCP server is one dependency and a few annotations. The split into two services pays for itself the first time you update a tool without touching the AI host.

Have questions about the setup or ran into something unexpected? Drop a comment below.

DEV Community