DEV Community: Pedro Santos

Testing and Debugging MCP

Pedro Santos — Wed, 20 May 2026 23:00:00 +0000

Testing and Debugging MCP: The Curl-First Approach

In the previous post, I connected an AI agent to 4 MCP servers with 12+ tools. It works. Until it doesn't.

When an agent gives a wrong answer, the question is always the same: is it the LLM, the prompt, or the tool? This post covers how I debug MCP integrations, starting with the simplest approach that catches 90% of issues.

Test the Tool Before You Test the Agent

The biggest mistake I made early on was debugging the agent when the problem was in the tool. I'd tweak the system prompt for hours, then realize the MCP server was returning malformed data.

Now I always test tools with curl first. MCP is just HTTP, so you can call any tool without an AI agent or LangChain4j.

Step 1: Open an SSE Session

curl -N http://localhost:8092/sse

This opens a Server-Sent Events connection. The server returns a sessionId in the first event. Copy it. You'll need it for every subsequent request.

Step 2: Initialize the Connection

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2024-11-05",
      "clientInfo": { "name": "debug-client", "version": "1.0.0" },
      "capabilities": {}
    }
  }'

Step 3: List Available Tools

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{ "jsonrpc": "2.0", "id": 2, "method": "tools/list", "params": {} }'

This returns all registered tools with their descriptions and schemas. I check three things here: are all expected tools listed? Are the descriptions accurate? Are the parameter schemas correct?

I once spent an hour debugging a failing agent. The tools/list response showed 2 tools instead of 3. I had forgotten to register getFraudRiskScore in the .tools(...) call. The LLM was trying to use a tool that didn't exist.

Step 4: Call a Tool Directly

curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
      "name": "getStockByProduct",
      "arguments": { "productCode": "COMIC_BOOKS" }
    }
  }'

This runs the exact same code path the agent would trigger. If the response is wrong here, the problem is in your service code, not the LLM.

Common MCP Bugs (and How to Find Them)

1. Tool Name Mismatch

The system prompt says getTransactionStatus. The MCP server exposes getPaymentStatus. The LLM tries to call a tool that doesn't exist. The agent gives a vague answer with no real data.

How to catch it: Run tools/list and compare every tool name against your system prompt. They must match exactly.

2. Missing Required Parameters

The tool schema says transactionId is required. The LLM calls it with only orderId. The tool fails with a null pointer or returns "not found."

How to catch it: Look at the schema in tools/list. If a parameter is required, the description should make clear where to get it. I add notes like "extract from the event's transactionId field" in the tool description.

3. Wrong Parameter Types

The schema says threshold is an integer. The LLM sends it as a string "3". The handler casts (Integer) args.get("threshold") and throws a ClassCastException.

How to catch it: Test with curl using both correct and incorrect types. The MCP SDK does some type coercion, but not all cases. I add explicit type checks in the handler for critical tools.

4. Overly Broad Tool Descriptions

The description says "gets data." The LLM calls it for everything, even when another tool would be better.

How to catch it: Ask the agent a question that should use a different tool. If it picks the wrong one, the description is too vague. I rewrite descriptions to include "use this when..." and "do not use for..." clauses.

Enabling Request/Response Logging

LangChain4j's MCP transport supports logging. I enable it during development:

private McpClient buildClient(String sseUrl) {
    return new DefaultMcpClient.Builder()
        .transport(new HttpMcpTransport.Builder()
            .sseUrl(sseUrl)
            .logResponses(true)
            .logRequests(true)
            .build())
        .build();
}

This prints every JSON-RPC request and response to the console. You see exactly which tools the agent calls, with which arguments, and what comes back. Noisy in production, invaluable during development.

On the server side, the same chat model logging works:

GoogleAiGeminiChatModel.builder()
    .logRequests(true)
    .logResponses(true)
    .build();

This shows the functionDeclaration sent to Gemini and the functionCall it generates. If the LLM is picking the wrong tool, you can see its reasoning in the request/response log.

Testing Tool Responses

The format of the tool response matters for the LLM. I switched from key=value strings to JSON early on.

Before:

status=SUCCESS | totalAmount=150.00 | totalItems=3

After:

return jsonUtil.toJson(paymentService.findByTransactionId(txId))
    .orElse("No payment found");

ObjectMapper.writeValueAsString() produces clean JSON that the LLM parses reliably. The key=value format caused parsing errors in about 10% of cases, where the LLM would treat the pipe character as part of the value.

The Debugging Checklist

When an agent gives a wrong or empty answer, I run through this checklist:

Test the tool with curl. Does it return the expected data?
Check tools/list. Are all tools registered? Do names match the system prompt?
Check tool descriptions. Are they specific enough for the LLM to pick the right one?
Check parameter schemas. Do required fields match what the LLM can extract from context?
Check maxSequentialToolsInvocations. Is it high enough for the workflow? A 5-saga analysis needs at least 11 tool calls.
Check maxOutputTokens. Is the response being truncated?
Enable logging. Look at the actual functionCall the LLM generates. Wrong tool? Wrong params?

Most bugs fall into categories 1-3. The tool itself is broken, the tool name doesn't match, or the description is wrong.

What I'd Add Next

I don't have automated integration tests for MCP yet. Each tool is tested in isolation via the service's unit tests. But the full chain (agent → MCP client → HTTP → MCP server → database) is only tested manually.

A proper integration test would: start the service with Testcontainers, connect an MCP client, call each tool, and verify the response. That's on the roadmap.

For now, the curl approach catches most issues and takes 30 seconds per tool.

The repo: github.com/pedrop3/saga-orchestration

MCP Client with LangChain4j

Pedro Santos — Mon, 11 May 2026 23:00:00 +0000

MCP Client with LangChain4j: Connecting an Agent to Multiple Services

In the previous post, I turned each microservice into an MCP server. Now let's connect an AI agent to all of them. The agent will have access to 12+ tools across 4 services and the LLM will decide which ones to call at runtime.

The Client Configuration

LangChain4j provides McpToolProvider, a tool provider that connects to one or more MCP servers and exposes their tools to the agent. Here's my config:

@Configuration
@RequiredArgsConstructor
public class McpClientConfig {

    @Value("${mcp.order-service-url}")
    private String orderServiceUrl;
    @Value("${mcp.payment-service-url}")
    private String paymentServiceUrl;
    @Value("${mcp.inventory-service-url}")
    private String inventoryServiceUrl;
    @Value("${mcp.product-validation-url}")
    private String productValidationUrl;

    @Bean
    public McpToolProvider mcpToolProvider() {
        return McpToolProvider.builder()
            .mcpClients(List.of(
                buildClient(orderServiceUrl),
                buildClient(paymentServiceUrl),
                buildClient(inventoryServiceUrl),
                buildClient(productValidationUrl)
            ))
            .build();
    }

    private McpClient buildClient(String sseUrl) {
        return new DefaultMcpClient.Builder()
            .transport(new HttpMcpTransport.Builder()
                .sseUrl(sseUrl)
                .logResponses(true)
                .logRequests(true)
                .build())
            .build();
    }
}

The URLs come from application.yml:

mcp:
  order-service-url:      ${ORDER_MCP_URL:http://localhost:3000/sse}
  payment-service-url:    ${PAYMENT_MCP_URL:http://localhost:8091/sse}
  inventory-service-url:  ${INVENTORY_MCP_URL:http://localhost:8092/sse}
  product-validation-url: ${PRODUCT_VALIDATION_MCP_URL:http://localhost:8090/sse}

Environment variables for production, localhost defaults for development. Standard Spring Boot pattern.

Building an Agent with MCP Tools

Once the McpToolProvider bean exists, wiring it into an agent is one line:

DataAnalystAgent agent = AiServices.builder(DataAnalystAgent.class)
    .chatModel(primaryChatModel)
    .toolProvider(mcpToolProvider)      // all 12+ tools from 4 services
    .maxSequentialToolsInvocations(5)   // safety limit
    .build();

The toolProvider replaces .tools(...). Instead of passing specific tool instances, you pass a provider that dynamically resolves tools from the MCP servers. The agent sees all tools from all connected servers.

maxSequentialToolsInvocations caps how many tool calls the agent can make in a single turn. Without this, a confused LLM could loop forever calling tools. I set it to 5 for the DataAnalystAgent. The OperationsAgent uses 3 because it only needs RAG context, no MCP calls.

What the LLM Sees

When the agent starts, LangChain4j calls tools/list on each MCP server. It collects all tool schemas and sends them to the LLM as functionDeclaration objects. The LLM sees something like:

Available tools:
- getOrderById(orderId: string) - Returns order details
- listRecentEvents(limit: integer) - Returns recent saga events
- getPaymentStatus(transactionId: string) - Returns payment status
- getFraudRiskScore(totalAmount: number, clientType: string, hourOfDay: integer) - Calculates fraud risk
- getStockByProduct(productCode: string) - Returns available stock
- getLowStockAlert(threshold: integer) - Returns low-stock products
... (12 total)

The LLM reads the descriptions and decides which tool to call based on the user's question. Ask "what's the stock for COMIC_BOOKS?" and the LLM picks getStockByProduct. Ask "list recent failed sagas" and it picks listRecentEvents.

The Agent Loop

Here's what happens when a user asks a question:

User: "Is there enough stock for COMIC_BOOKS?"
  ↓
LLM reads the question + all 12 tool descriptions
  ↓
LLM generates: functionCall(getStockByProduct, {productCode: "COMIC_BOOKS"})
  ↓
LangChain4j intercepts the functionCall
  ↓
McpToolProvider routes it to inventory-service's MCP server
  ↓
HTTP POST to http://localhost:8092/mcp/message → tools/call
  ↓
inventory-service runs inventoryService.findByProductCode("COMIC_BOOKS")
  ↓
Returns: "available=600"
  ↓
LangChain4j sends the result back to the LLM as functionResponse
  ↓
LLM generates: "Yes, COMIC_BOOKS has 600 units available."

The agent doesn't know which service owns which tool. It doesn't know the URLs. It just calls tools by name and the McpToolProvider handles the routing.

Multi-Tool Chains

The interesting cases involve multiple tools. When you ask "list the 5 most recent failed sagas and assess their fraud risk," the agent needs to:

Call listRecentEvents(15) on order-service
Filter for FAIL status
For each failed saga, call getOrderById() on order-service
For each order, call getFraudRiskScore() on payment-service

That's 11 tool calls across 2 services in a single question. The LLM chains them automatically. Each tool call returns data that informs the next one.

This only works because I set maxSequentialToolsInvocations(5) high enough for the workflow. For simpler agents that only need one or two lookups, I keep it at 3.

Virtual Threads Matter

Each MCP tool call is an HTTP request. Without virtual threads, 5 sequential tool calls take 5x the latency. With virtual threads, LangChain4j can parallelize independent calls.

spring:
  threads:
    virtual:
      enabled: true

One line in application.yml. In my tests, a 5-tool chain dropped from ~8 seconds to ~3 seconds. The calls that don't depend on each other's results run in parallel.

Error Handling

MCP servers can fail. Network timeouts, service restarts, tool exceptions. The McpToolProvider handles most of this transparently. If a tool call fails, the result sent back to the LLM is an error message. The LLM usually adapts by trying a different tool or reporting that the data is unavailable.

For critical failures (MCP server completely down), the agent fails when trying to initialize the tool list. I handle this at the service level:

public String runAgent(String userQuestion) {
    try {
        DataAnalystAgent agent = createAgent();
        return agent.analyze(userQuestion);
    } catch (Exception e) {
        e.printStackTrace();
        return "Agent failed: " + e.getMessage();
    }
}

Not elegant, but functional. The agent never crashes the application. The worst case is a failed query with an error message.

@Tool vs McpToolProvider: When I Use Each

In my project, MCP handles everything that crosses service boundaries. But I still use @Tool in one place: the SagaComposerAgent doesn't need MCP tools. It only needs the DataAnalystAgent as a sub-tool (agent-calling-agent). For that, I register the sub-agent as a local @Tool:

var sagaComposerAgent = AiServices.builder(SagaComposerAgent.class)
    .chatModel(primaryChatModel)
    .maxSequentialToolsInvocations(3)   // no MCP tools needed
    .build();

Rule of thumb: same JVM, use @Tool. Different service, use MCP.

What's Next

The client and servers are connected. But how do you debug when the agent calls the wrong tool or gets unexpected results? In the next post, I'll cover testing and debugging MCP: manual curl testing, log analysis, and the mistakes I made with tool descriptions that caused silent failures.

The repo: github.com/pedrop3/saga-orchestration

Vectorizing Real-Time Kafka Events

Pedro Santos — Mon, 11 May 2026 23:00:00 +0000

Vectorizing Real-Time Kafka Events for RAG

In the previous post, I set up pgvector and Ollama for embedding and vector search. Now I need to fill the database with data. Not documents or PDFs. Real-time saga events flowing through Kafka.

Every time a saga finishes (success or failure), the system publishes to a notify-ending topic. My AI agent listens to that topic and vectorizes every event. Over time, this builds a searchable history of all saga executions.

The Kafka Consumer

The entry point is a standard Spring Kafka listener:

@KafkaListener(
    groupId = "${spring.kafka.consumer.group-id}",
    topics = "${spring.kafka.topic.notify-ending}")
public void onSagaEnded(String payload) {
    var event = parseEvent(payload).orElse(null);
    if (event == null) {
        log.warn("[OperationsService] Failed to parse event payload");
        return;
    }

    // Vectorize ALL events for learning
    String historyText = buildHistoryText(event);
    vectorize(event, historyText);

    // Diagnose only failures
    if (event.getStatus() == SagaStatusEnum.FAIL) {
        diagnose(event, historyText);
    }
}

Two key decisions here. First, I vectorize all events, not just failures. Success events are valuable too. They establish what "normal" looks like, which helps the vector search distinguish between common patterns and anomalies.

Second, diagnosis only runs on failures. There's no point asking the LLM to analyze a successful saga. But the successful saga still gets stored as context for future failure analysis.

Building the Text Representation

The event object carries structured data: source, status, message, timestamps. I need to convert this into a text string that embeds well.

private String buildHistoryText(Event event) {
    return event.getEventHistory().stream()
        .map(h -> h.getSource() + " [" + h.getStatus() + "]: " + h.getMessage())
        .collect(Collectors.joining("\n"));
}

A typical output looks like:

ORCHESTRATOR [SUCCESS]: Saga started!
PRODUCT_VALIDATION_SERVICE [SUCCESS]: Products are validated successfully!
PAYMENT_SERVICE [ROLLBACK]: Fail to realize payment: New customer limit exceeded: R$450.00 > R$500.00
PAYMENT_SERVICE [FAIL]: Rollback executed for payment!
PRODUCT_VALIDATION_SERVICE [FAIL]: Rollback executed on product validation!
ORCHESTRATOR [FAIL]: Saga finished with errors!

This format works well for embeddings because it preserves the sequence of events, the service names, and the error messages. When two failures have similar histories, their vector representations will be close together.

Vectorizing with Metadata

The vectorization step creates an embedding and stores it with metadata for later filtering:

private void vectorize(Event event, String historyText) {
    String profileKey = classifyProfile(event);

    var metadata = new Metadata()
        .put("orderId",    event.getOrderId())
        .put("status",     event.getStatus().toString())
        .put("profileKey", profileKey)
        .put("createdAt",  LocalDateTime.now().toString());

    var segment = TextSegment.from(historyText, metadata);
    embeddingStore.add(embeddingModel.embed(segment).content(), segment);
}

The profileKey classifies the customer into categories like new:high-value, vip:any, or returning:low-value:

public String classify(Order order) {
    if (order == null || order.getOrderId() == null) return "default";
    String clientType   = resolveClientType(order);
    String valueSegment = resolveValueSegment(order, clientType);
    return clientType + ":" + valueSegment;
}

private String resolveClientType(Order order) {
    if (order.getClientType() == null) return "new";
    return switch (order.getClientType().toLowerCase()) {
        case "vip"       -> "vip";
        case "returning" -> "returning";
        default          -> "new";
    };
}

private String resolveValueSegment(Order order, String clientType) {
    if ("vip".equals(clientType)) return "any";
    return order.getTotalAmount() >= 200.0 ? "high-value" : "low-value";
}

This metadata serves two purposes. It lets me filter searches by profile (only find incidents from similar customers). And it lets me query the embedding store for analytics (how many failures per profile, what's the most common failure pattern for VIPs).

Processing Speed

Each vectorization involves two operations: calling Ollama to generate the embedding and writing to pgvector.

The Ollama call takes 5-15ms for nomic-embed-text on a modern laptop. The pgvector write takes 1-2ms. Total overhead per event: under 20ms.

At the volumes my system handles (hundreds of sagas per day), this is negligible. The Kafka consumer processes events as fast as they arrive. No backpressure, no batching needed.

For higher volumes (thousands per second), you'd want to batch the embeddings and use bulk inserts. LangChain4j's EmbeddingStoreIngestor supports this, but I haven't needed it.

Virtual Threads

The vectorization runs on a Kafka consumer thread. With Spring Boot's virtual threads enabled, each message gets its own virtual thread. The Ollama HTTP call and the pgvector write don't block the consumer group.

spring:
  threads:
    virtual:
      enabled: true

Without virtual threads, a slow Ollama response could delay message processing and cause consumer lag. With virtual threads, it's a non-issue.

What Gets Stored in pgvector

After running for a while, the saga_history_embeddings table looks like this:

embedding (vector)	text	metadata
[0.023, -0.114, 0.089, ...]	ORCHESTRATOR [SUCCESS]: Saga started...	{orderId: "abc", status: "SUCCESS", profileKey: "vip:any"}
[0.045, -0.098, 0.102, ...]	ORCHESTRATOR [SUCCESS]: Saga started... PAYMENT [ROLLBACK]: blocked...	{orderId: "def", status: "FAIL", profileKey: "new:high-value"}

Each row is one saga execution. The embedding captures the semantic meaning of the entire history. Two payment failures for the same reason will have similar embeddings even if the order IDs, amounts, and timestamps are different.

What's Next

The data is flowing in. In the next post, I'll show how to search it: finding similar past incidents when a new saga fails, tuning the similarity threshold, and injecting the results into the LLM prompt for diagnosis.

The repo: github.com/pedrop3/saga-orchestration

pgvector + Ollama Setup

Pedro Santos — Wed, 06 May 2026 23:00:00 +0000

RAG Without the Chatbot: pgvector + Ollama for Operational Data

Most RAG tutorials start with "upload a PDF and ask questions about it." That's fine for document search. But I needed RAG for something different: diagnosing failures in a distributed system by searching through historical saga events.

No PDFs. No chatbot. Just a Kafka consumer that vectorizes every saga event into pgvector and an agent that searches similar past incidents to diagnose new failures.

This series covers how I built it. The stack is Ollama for local embeddings, pgvector on PostgreSQL for storage, and LangChain4j to tie it together.

Why RAG (and Not Just Logs)

My saga orchestrator processes orders across 5 microservices. When a saga fails, the event carries a full history: which services ran, what status each returned, what error messages were generated. This data lives in Kafka and MongoDB.

I could search logs. But logs are text. Searching "payment failed" gives you exact matches. It doesn't find incidents where the payment was blocked for a different reason but with a similar pattern: same customer type, similar amount, same time of day.

RAG with vector search finds similar incidents, not exact matches. You convert the failure description to a vector (a list of numbers). You store it alongside thousands of past incidents. When a new failure arrives, you search for the closest vectors. The results are incidents that look like the current one, even if the words are different.

The Stack

Component	Tool	Role
Embedding model	Ollama + `nomic-embed-text`	Converts text to 768-dimensional vectors
Vector store	pgvector on PostgreSQL	Stores and searches vectors
Application	LangChain4j	Connects embedding model to vector store

I chose this stack because it runs locally with no cloud dependencies. Ollama is free. pgvector is a PostgreSQL extension, so it uses the same database infrastructure I already have. No separate vector database to manage.

Setting Up pgvector

pgvector is a PostgreSQL extension. The easiest way to run it is with the official Docker image:

# docker-compose.yml
vectors-db:
  image: pgvector/pgvector:pg16
  environment:
    POSTGRES_DB: vectors-db
    POSTGRES_USER: postgres
    POSTGRES_PASSWORD: postgres
  ports:
    - "5435:5432"
  volumes:
    - ./ai-saga-agent/src/main/resources/init-vectors.sql:/docker-entrypoint-initdb.d/init.sql

The init script enables the extension:

CREATE EXTENSION IF NOT EXISTS vector;

That's it. PostgreSQL now supports vector columns and similarity search.

Setting Up Ollama

Install Ollama and pull the embedding model:

brew install ollama
ollama pull nomic-embed-text    # 274MB, 768-dimensional output
ollama serve                    # API at http://localhost:11434

nomic-embed-text is a good default for production. It's small (274MB vs 4GB+ for chat models), fast (single-digit milliseconds per embedding), and the quality is solid for operational data like log messages and event histories.

LangChain4j Configuration

Three beans connect everything:

@Configuration
public class EmbeddingConfig {

    @Value("${ai.ollama.base-url}")
    private String ollamaUrl;

    @Bean
    public EmbeddingModel embeddingModel() {
        return OllamaEmbeddingModel.builder()
            .baseUrl(ollamaUrl)
            .modelName("nomic-embed-text")
            .build();
    }

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore(DataSource dataSource) {
        return PgVectorEmbeddingStore.datasourceBuilder()
            .datasource(dataSource)
            .table("saga_history_embeddings")
            .dimension(768)       // nomic-embed-text output size
            .createTable(true)    // auto-creates if not exists
            .build();
    }

    @Bean
    public EmbeddingStoreIngestor ingestor(
            EmbeddingModel model, EmbeddingStore<TextSegment> store) {
        return EmbeddingStoreIngestor.builder()
            .embeddingModel(model)
            .embeddingStore(store)
            .build();
    }
}

The EmbeddingModel connects to Ollama. The EmbeddingStore connects to pgvector. The EmbeddingStoreIngestor is a convenience that handles the embed-and-store pipeline.

The dimension(768) parameter must match the output size of your embedding model. nomic-embed-text produces 768-dimensional vectors. If you switch models, update this number.

createTable(true) auto-creates the saga_history_embeddings table with the right schema on first run. In production, you'd manage this with migrations.

How Embedding Works

When you embed a piece of text, the model converts it to a list of 768 numbers. Similar texts produce similar numbers. "Payment failed due to insufficient funds" and "Card declined for low balance" will have vectors that are close together in 768-dimensional space, even though they share few words.

In code:

// Text in
String text = "PAYMENT_SERVICE [ROLLBACK]: Fail to realize payment: " +
              "New customer limit exceeded: R$450.00 > R$500.00";

// Vector out (768 floats)
Embedding embedding = embeddingModel.embed(text).content();

// Store with metadata
var segment = TextSegment.from(text, new Metadata()
    .put("orderId", "abc123")
    .put("status", "FAIL")
    .put("profileKey", "new:high-value"));

embeddingStore.add(embedding, segment);

The Metadata lets you filter results later. You could search only for incidents with status=FAIL or profileKey=new:high-value.

Searching for Similar Incidents

To find past incidents similar to a new failure:

String newFailure = "PAYMENT_SERVICE [ROLLBACK]: Transaction blocked by fraud prevention";

var queryEmbedding = embeddingModel.embed(newFailure).content();
var results = embeddingStore.search(
    EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .maxResults(3)
        .minScore(0.75)
        .build());

for (var match : results.matches()) {
    System.out.println("Score: " + match.score());
    System.out.println("Text: " + match.embedded().text());
}

minScore(0.75) filters out weak matches. I found that scores below 0.7 tend to be noise in my data. Scores above 0.85 are usually the same type of failure.

maxResults(3) keeps the RAG context manageable. The LLM performs better with 3 highly relevant examples than 10 mediocre ones.

What's Next

The embedding and search infrastructure is ready. In the next post, I'll show how I feed it with real-time Kafka events: every saga completion gets vectorized, building up a knowledge base that improves the agent's diagnoses over time.

The repo: github.com/pedrop3/saga-orchestration

Testing Sagas with Real Failure Scenarios

Pedro Santos — Mon, 04 May 2026 23:00:00 +0000

In the previous post, I walked through the compensation logic in each service. The code looks clean on paper. But sagas have a lot of moving parts, and bugs tend to hide in the transitions between services, not inside a single service.

This post covers how I test the saga system: unit tests for each service, orchestrator routing tests, and the edge cases that caught me off guard.

Testing the Orchestrator Routing

The orchestrator's state transition table is the most critical piece. If it routes to the wrong topic, the entire saga breaks. I test every (source, status) combination:

@Test
void shouldReturnNextTopicGivenValidSourceAndSuccessStatus() {
    setEvent(PAYMENT_SERVICE.toString(), SUCCESS);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(INVENTORY_SUCCESS, topic);
}

@Test
void shouldReturnFailTopicGivenValidSourceAndFailStatus() {
    setEvent(PAYMENT_SERVICE.toString(), FAIL);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(PRODUCT_VALIDATION_FAIL, topic);
}

@Test
void shouldReturnRollbackTopic() {
    setEvent(PRODUCT_VALIDATION_SERVICE.toString(), ROLLBACK);

    TopicsEnum topic = sagaExecutionController.getNextTopic(event);

    assertEquals(PRODUCT_VALIDATION_FAIL, topic);
}

These tests are fast and deterministic. No Kafka, no databases. Just the lookup logic. If someone adds a new service to the saga and forgets to update the table, the test for that (source, status) pair will fail with "Topic not found!"

Edge Cases in Routing

Two cases that caught me early on:

@Test
void shouldThrowValidationExceptionWhenSourceIsNull() {
    setEvent(null, SUCCESS);

    ValidationException ex = assertThrows(ValidationException.class, () -> {
        sagaExecutionController.getNextTopic(event);
    });

    assertEquals("Source and status must be informed.", ex.getMessage());
}

@Test
void shouldThrowValidationExceptionWhenTopicNotFound() {
    setEvent(PAYMENT_SERVICE.toString(), TIMEOUT);

    ValidationException ex = assertThrows(ValidationException.class, () -> {
        sagaExecutionController.getNextTopic(event);
    });

    assertEquals("Topic not found!", ex.getMessage());
}

The TIMEOUT status exists in the enum but has no mapping in the saga table. Without this test, a timeout event would silently disappear. The exception makes it visible immediately.

Testing the OrchestrationService

The orchestration layer adds history entries and publishes to Kafka. I mock the producer and verify the correct topic:

@Test
void shouldStartSagaSuccessfully() {
    when(sagaExecutionController.getNextTopic(event))
        .thenReturn(TopicsEnum.PRODUCT_VALIDATION_SUCCESS);

    orchestrationService.startSaga(event);

    verify(producer).sendEvent(eq("product-validation-success"), eq("{json}"));
    assertEquals("ORCHESTRATOR", event.getSource());
    assertEquals(SUCCESS, event.getStatus());
    assertTrue(event.getEventHistory().stream()
        .anyMatch(h -> h.getMessage().contains("Saga started")));
}

@Test
void shouldFinishSagaWithFailure() {
    orchestrationService.finishSagaFail(event);

    verify(producer).sendEvent(eq("notify-ending"), eq("{json}"));
    assertEquals(FAIL, event.getStatus());
    assertTrue(event.getEventHistory().stream()
        .anyMatch(h -> h.getMessage().contains("with errors")));
}

The history assertion is important. It verifies that each step leaves a trace. If a saga fails and the history is empty, debugging becomes guesswork.

Testing Payment: The Happy and Sad Paths

The payment-service has the most complex logic. It validates amounts, checks fraud scores, simulates gateway responses, and handles refunds. Here's how I test the main scenarios:

Payment Success

@Test
void shouldRealizePaymentSuccessfully_givenValidOrderAndAmount() {
    givenNoExistingPayment();
    givenPaymentFound();
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(SUCCESS, event.getStatus());
    assertEquals("PAYMENT_SERVICE", event.getSource());
    assertEquals(20.0, event.getOrder().getTotalAmount());
    assertHistoryContains("Payment realized successfully");
    verify(producer).sendEvent("{json}");
}

Amount Below Minimum

@Test
void shouldRollback_givenAmountIsLessThanMinimum() {
    event = buildEvent(0.0, 1);     // unit value = 0.0
    payment = buildPayment(0.0, 1);
    givenNoExistingPayment();
    givenPaymentFound();
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(ROLLBACK, event.getStatus());
    assertHistoryContains("minimal amount");
}

Duplicate Transaction

@Test
void shouldRollback_givenTransactionAlreadyExists() {
    when(paymentRepository.existsByOrderIdAndTransactionId(any(), any()))
        .thenReturn(true);
    givenJsonSerialization();

    paymentService.realizePayment(event);

    assertEquals(ROLLBACK, event.getStatus());
    assertHistoryContains("transactionId");
}

Refund (Compensation)

@Test
void shouldRealizeRefund_whenPaymentExists() {
    when(paymentRepository.findByOrderIdAndTransactionId(any(), any()))
        .thenReturn(Optional.of(payment));
    givenJsonSerialization();

    paymentService.realizeRefund(event);

    assertEquals(FAIL, event.getStatus());
    assertEquals(PaymentStatus.REFUND, payment.getStatus());
    assertHistoryContains("Rollback executed for payment");
    verify(paymentRepository).save(payment);
}

Refund Failure (Compensation of the Compensation)

This is the tricky one. What if the refund itself fails? The payment-service still publishes FAIL so the saga can continue rolling back. It just logs that the refund didn't execute:

@Test
void shouldHandleRefundFailureGracefully_whenPaymentNotFound() {
    when(paymentRepository.findByOrderIdAndTransactionId(any(), any()))
        .thenThrow(new RuntimeException("DB error"));
    givenJsonSerialization();

    paymentService.realizeRefund(event);

    assertEquals(FAIL, event.getStatus());
    assertHistoryContains("Rollback not executed for payment");
    verify(producer).sendEvent("{json}");
}

The saga doesn't get stuck. The refund failure is recorded in the history for manual intervention later.

Testing Inventory Rollback

The inventory tests follow the same pattern. The interesting case is restoring stock to its previous value:

@Test
void shouldRollbackInventorySuccessfully() {
    OrderInventory orderInventory = OrderInventory.builder()
        .inventory(inventory)
        .oldQuantity(10)
        .newQuantity(5)
        .orderId("order-1")
        .transactionId("tx-123")
        .build();

    when(orderInventoryRepository.findByOrderIdAndTransactionId("order-1", "tx-123"))
        .thenReturn(List.of(orderInventory));

    inventoryService.rollbackInventory(event);

    assertEquals(FAIL, event.getStatus());
    assertEquals(10, inventory.getAvailable());  // restored to old value
    assertHistoryContains("Rollback executed for inventory");
}

The oldQuantity was 10, the forward action reduced it to 5, and the rollback restores it to 10. Without the OrderInventory record that saves both values, this rollback would be impossible.

A Helper That Saves Time

I use the same assertion helper across all service tests:

private void assertHistoryContains(String expectedMessage) {
    assertTrue(event.getEventHistory().stream()
        .anyMatch(h -> h.getMessage().toLowerCase()
            .contains(expectedMessage.toLowerCase())),
        "Expected message not found in history: " + expectedMessage);
}

This checks that the service added the right message to the event history. Every test verifies both the status AND the history. The status controls the saga flow. The history tells you why.

What I'd Do Differently

Looking back, there are a few things I'd add:

Integration tests with embedded Kafka. The unit tests mock the producer, so they don't catch serialization bugs or topic misconfiguration. An embedded Kafka setup would let me publish a real event and verify the full chain.

Testcontainers for the databases. The unit tests mock the repositories. A Testcontainers setup with real PostgreSQL and MongoDB would catch schema issues and migration bugs.

Chaos testing. Kill a service mid-saga and verify recovery. Introduce network delays between services. These are the scenarios that break sagas in production, and they're hard to test with mocks alone.

These are in the roadmap. For now, the unit tests cover the routing logic and compensation flows well enough to catch regressions.

Wrapping Up

The saga orchestrator pattern works because each piece is testable in isolation. The state transition table is a pure function. Each service's forward and compensation logic can be tested with mocked dependencies. The event history gives you a built-in audit trail.

The full test suite runs in seconds because nothing touches real infrastructure. That's the payoff of keeping the orchestrator stateless and the services decoupled.

The repo (with all tests): github.com/pedrop3/saga-orchestration

Building an MCP Server in Spring Boot

Pedro Santos — Wed, 29 Apr 2026 23:00:00 +0000

Building an MCP Server in Spring Boot (Step by Step)

In the previous post, I explained what MCP is and why it matters. Now let's build one. I'll take my payment-service, a regular Spring Boot app with PostgreSQL and Kafka, and add an MCP server to it in about 30 minutes.

By the end, the service will expose three tools that any AI agent can discover and call: getPaymentStatus, getRefundRate, and getFraudRiskScore.

Starting Point

My payment-service already has these Spring beans:

@Service
public class PaymentService {
    public Optional<Payment> findByTransactionId(String txId) { ... }
    public long count() { ... }
    public long countByStatus(PaymentStatus status) { ... }
    public List<Payment> findByStatusOrderByCreatedAtDesc(PaymentStatus status) { ... }
}

@Service
public class FraudValidationService {
    public String calculateFraudScore(double amount, String clientType, int hour,
                                      int orderCount, double successRate, int totalItems) { ... }
}

These are existing business logic methods with real database queries. The goal is to expose them via MCP without changing their implementation.

Step 1: Add the Dependency

implementation 'io.modelcontextprotocol.sdk:mcp:0.9.0'

That's the only new dependency. The MCP SDK is lightweight and has no transitive dependencies that conflict with Spring Boot.

Step 2: Set Up the SSE Transport

MCP needs an HTTP transport for communication. The SDK provides an SSE-based transport that works as a servlet:

@Configuration
public class PaymentMcpConfig {

    @Bean
    public HttpServletSseServerTransportProvider mcpTransport() {
        return HttpServletSseServerTransportProvider.builder()
            .objectMapper(new ObjectMapper())
            .messageEndpoint("/mcp/message")
            .build();
    }

    @Bean
    public ServletRegistrationBean<HttpServletSseServerTransportProvider> mcpServlet(
            HttpServletSseServerTransportProvider transport) {
        return new ServletRegistrationBean<>(transport, "/sse", "/mcp/message");
    }
}

Two endpoints are registered. /sse is the SSE connection endpoint where clients connect. /mcp/message is where JSON-RPC messages are sent. The objectMapper handles serialization.

Step 3: Define Your Tools

Each tool has four components: a name, a description (this is what the LLM reads to decide when to use it), a JSON schema for parameters, and a handler function.

Here's the payment status tool:

private SyncToolSpecification getPaymentStatus(PaymentService paymentService) {
    return tool(
        "getPaymentStatus",
        "Returns the current payment status for a given transaction. " +
        "Use to verify whether a payment was processed, pending, or refunded.",
        """
        {
          "type": "object",
          "properties": {
            "transactionId": {
              "type": "string",
              "description": "Transaction ID associated with the saga"
            }
          },
          "required": ["transactionId"]
        }
        """,
        args -> {
            String txId = (String) args.get("transactionId");
            return paymentService.findByTransactionId(txId)
                .map(p -> "status=" + p.getStatus()
                    + " | totalAmount=" + p.getTotalAmount()
                    + " | totalItems=" + p.getTotalItems())
                .orElse("No payment found for transactionId=" + txId);
        }
    );
}

The description matters more than you'd expect. The LLM uses it to decide when to call this tool. A vague description like "gets payment info" leads to the agent calling it at wrong times. A precise description like "returns payment status for a given transaction, use to verify whether processed, pending, or refunded" gives the LLM the context it needs.

The handler is a Function<Map<String, Object>, String>. It receives the arguments as a map, calls your existing business logic, and returns a string. The return value goes back to the LLM as context for generating its response.

Step 4: Build the MCP Server

Wire everything together:

@Bean
public McpSyncServer mcpServer(
        HttpServletSseServerTransportProvider transport,
        PaymentService paymentService,
        FraudValidationService fraudService) {

    return McpServer.sync(transport)
        .serverInfo("payment-mcp", "1.0.0")
        .capabilities(ServerCapabilities.builder().tools(true).build())
        .tools(
            getPaymentStatus(paymentService),
            getRefundRate(paymentService),
            getFraudRiskScore(fraudService)
        )
        .build();
}

The serverInfo is what the client sees when it connects. The capabilities declaration tells the client this server supports tools. The .tools(...) call registers all your tool specifications.

A Helper to Reduce Boilerplate

I use a small helper method to avoid repeating the SyncToolSpecification construction:

private SyncToolSpecification tool(String name,
                                   String description,
                                   String schema,
                                   Function<Map<String, Object>, String> handler) {
    return new SyncToolSpecification(
        new McpSchema.Tool(name, description, schema),
        (exchange, args) -> success(handler.apply(args))
    );
}

private CallToolResult success(String text) {
    return CallToolResult.builder()
        .content(List.of(new TextContent(text)))
        .isError(false)
        .build();
}

Every tool follows the same pattern: receive args, call business logic, return text. The helper keeps each tool definition focused on its own logic.

The Full Config for All 4 Services

I repeated this pattern for each microservice. The tools map directly to existing service methods:

order-service (MongoDB):

.tools(
    getOrderById(orderRepository),
    getLastEventByOrder(eventRepository),
    listRecentEvents(eventRepository)
)

inventory-service (PostgreSQL):

.tools(
    getStockByProduct(inventoryService),
    getLowStockAlert(inventoryService),
    checkReservationExists(orderInventoryRepository)
)

product-validation-service (PostgreSQL):

.tools(
    checkProductExists(productValidationService),
    checkValidationExists(productValidationService),
    listCatalog(productValidationService)
)

Each config class follows the same structure. Transport bean, servlet registration, server bean with tools. Copy the pattern, change the tools.

Writing Good Tool Descriptions

After building all 12 tools, I noticed a pattern in which descriptions work well with LLMs and which ones cause problems.

Bad description: "Gets stock." The LLM doesn't know when to call this vs the payment tool.

Good description: "Returns the current available stock quantity for a given product code. Use this tool to check whether a product has sufficient inventory before processing a saga order."

The trick is to include two things: what the tool returns and when to use it. The "use this tool to..." part gives the LLM decision criteria.

For parameters, be explicit about valid values:

{
  "productCode": {
    "type": "string",
    "description": "Product code: COMIC_BOOKS, BOOKS, MOVIES, MUSIC"
  }
}

Listing the valid values in the description prevents the LLM from inventing codes like "COMICS" or "BOOK".

What's Next

The servers are running. In the next post, I'll show the client side: how to connect to multiple MCP servers from a single LangChain4j agent and let the LLM pick the right tools at runtime.

The repo: github.com/pedrop3/saga-orchestration

Rollback Chains: When Payment Fails, What Actually Happens

Pedro Santos — Tue, 28 Apr 2026 08:00:00 +0000

Rollback Chains: When Payment Fails, What Actually Happens

In the previous post, I showed the orchestrator's state transition table. It knows which topic to publish on failure. But what happens on the receiving end? What does "rollback" actually look like in code?

This post walks through three real failure scenarios in my saga system. Each one triggers a different rollback chain, and each service handles compensation differently.

How Compensation Works

Every service implements two operations: the forward action and its compensation.

Service	Forward	Compensation
product-validation	`validateExistingProducts()`	`rollbackEvent()`
payment-service	`realizePayment()`	`realizeRefund()`
inventory-service	`updateInventory()`	`rollbackInventory()`

The forward action does the work and publishes SUCCESS or ROLLBACK to the orchestrator. The compensation undoes the work and publishes FAIL.

There's an important distinction between ROLLBACK and FAIL. ROLLBACK means "I failed, and I need my own compensation first." FAIL means "I already rolled back, now the previous service needs to compensate." The orchestrator uses this to chain rollbacks in the correct order.

Scenario 1: Inventory Fails (Full Rollback Chain)

This is the worst case. Product validation passed. Payment was charged. Then inventory fails because the product is out of stock.

The rollback chain goes: Inventory → Payment → Product Validation → Finish Fail.

Step 1: Inventory Detects the Problem

public void updateInventory(Event event) {
    try {
        checkCurrentValidation(event);
        createOrderInventory(event);
        updateInventory(event.getOrder());
        handleSuccess(event);
    } catch (Exception ex) {
        log.error("Error trying to update inventory: ", ex);
        handleFailCurrentNotExecuted(event, ex.getMessage());
    }
    producer.sendEvent(jsonUtil.toJson(event).orElseThrow());
}

private void checkInventory(int available, int orderQuantity) {
    if (orderQuantity > available) {
        throw new ValidationException("Product is out of stock!");
    }
}

When stock is insufficient, checkInventory throws. The catch block calls handleFailCurrentNotExecuted, which sets status to ROLLBACK. The event goes back to the orchestrator with source=INVENTORY_SERVICE, status=ROLLBACK.

private void handleFailCurrentNotExecuted(Event event, String message) {
    event.setStatus(ROLLBACK);
    event.setSource(CURRENT_SOURCE);
    addHistory(event, "Fail to update inventory: ".concat(message));
}

Step 2: Orchestrator Routes the Rollback

The orchestrator looks up the table: INVENTORY_SERVICE + ROLLBACK → INVENTORY_FAIL. It publishes to inventory-fail.

Step 3: Inventory Compensates Itself

The inventory-service consumes from inventory-fail and restores the previous quantities:

public void rollbackInventory(Event event) {
    event.setStatus(FAIL);
    event.setSource(CURRENT_SOURCE);
    try {
        returnInventoryToPreviousValues(event);
        addHistory(event, "Rollback executed for inventory!");
    } catch (Exception ex) {
        addHistory(event, "Rollback not executed for inventory: ".concat(ex.getMessage()));
    }
    producer.sendEvent(jsonUtil.toJson(event).orElseThrow());
}

private void returnInventoryToPreviousValues(Event event) {
    orderInventoryRepository
        .findByOrderIdAndTransactionId(event.getOrder().getOrderId(), event.getTransactionId())
        .forEach(orderInventory -> {
            var inventory = orderInventory.getInventory();
            inventory.setAvailable(orderInventory.getOldQuantity());
            inventoryRepository.save(inventory);
        });
}

Notice that it uses oldQuantity from the OrderInventory record. When the forward action runs, it saves both old and new quantities. The rollback reads the old value and restores it. No guessing.

After rollback, inventory publishes FAIL back to the orchestrator. Now the orchestrator sees INVENTORY_SERVICE + FAIL → PAYMENT_FAIL and publishes to payment-fail.

Step 4: Payment Refunds

public void realizeRefund(Event event) {
    event.setStatus(FAIL);
    event.setSource(CURRENT_SOURCE);
    try {
        changePaymentStatusToRefund(event);
        addHistory(event, "Rollback executed for payment!");
    } catch (Exception ex) {
        addHistory(event, "Rollback not executed for payment: ".concat(ex.getMessage()));
    }
    producer.sendEvent(jsonUtil.toJson(event).orElseThrow());
}

private void changePaymentStatusToRefund(Event event) {
    var payment = findByOrderIdAndTransactionId(event);
    payment.setStatus(PaymentStatus.REFUND);
    setEventAmountItems(event, payment);
    save(payment);
}

Payment changes the status to REFUND and publishes FAIL. The orchestrator sees PAYMENT_SERVICE + FAIL → PRODUCT_VALIDATION_FAIL and continues the chain.

Step 5: Product Validation Rolls Back

public void rollbackEvent(Event event) {
    changeValidationToFail(event);
    event.setStatus(FAIL);
    event.setSource(CURRENT_SOURCE);
    addHistory(event, "Rollback executed on product validation!");
    producer.sendEvent(jsonUtil.toJson(event).orElseThrow());
}

private void changeValidationToFail(Event event) {
    validationRepository
        .findByOrderIdAndTransactionId(event.getOrderId(), event.getTransactionId())
        .ifPresentOrElse(
            validation -> {
                validation.setSuccess(false);
                validationRepository.save(validation);
            },
            () -> createValidation(event, false));
}

Finally, the orchestrator sees PRODUCT_VALIDATION_SERVICE + FAIL → FINISH_FAIL and the saga ends.

Scenario 2: Payment Fails (Partial Rollback)

Payment fails (card declined, fraud blocked). Inventory was never touched, so there's nothing to roll back there. Only product validation needs compensation.

Payment ROLLBACK → payment-fail → Payment refunds → FAIL
→ product-validation-fail → Validation marks as failed → FAIL
→ finish-fail

Shorter chain. Same mechanism.

Scenario 3: Product Validation Fails (No Rollback Needed)

Product validation is the first step. If the product doesn't exist in the catalog, nothing was charged and nothing was reserved. The orchestrator sees PRODUCT_VALIDATION_SERVICE + FAIL → FINISH_FAIL and skips straight to the end.

The Pattern: Save Before You Change

Every service follows the same pattern for safe rollbacks. Before changing any data, save the original values.

The inventory-service saves oldQuantity and newQuantity in OrderInventory:

private OrderInventory createOrderInventory(Event event,
                                            OrderProducts product,
                                            Inventory inventory) {
    return OrderInventory.builder()
        .inventory(inventory)
        .oldQuantity(inventory.getAvailable())    // save current
        .orderQuantity(product.getQuantity())
        .newQuantity(inventory.getAvailable() - product.getQuantity())  // save new
        .orderId(event.getOrder().getOrderId())
        .transactionId(event.getTransactionId())
        .build();
}

The payment-service saves the payment with status PENDING before processing:

private void createPendingPayment(Event event) {
    var payment = Payment.builder()
        .orderId(event.getOrder().getOrderId())
        .transactionId(event.getTransactionId())
        .totalAmount(totalAmount)
        .totalItems(totalItems)
        .build();
    save(payment);  // status defaults to PENDING via @PrePersist
}

This way, the compensation logic always has the data it needs to undo the change.

Idempotency Guards

Each service checks for duplicate transactions before processing:

private void checkCurrentValidation(Event event) {
    if (orderInventoryRepository.existsByOrderIdAndTransactionId(
            event.getOrder().getOrderId(), event.getTransactionId())) {
        throw new ValidationException("There's another transactionId for this validation.");
    }
}

If Kafka delivers the same message twice (at-least-once delivery), the service detects the duplicate and skips the forward action. Without this check, you'd charge the customer twice or double-deduct inventory.

What's Next

The rollback logic is clean when you read the code, but things get tricky in practice. What if the refund fails? What if Kafka drops a message? In the next post, I'll cover how I test these scenarios with unit tests and real failure injection.

The repo: github.com/pedrop3/saga-orchestration

What is MCP and Why Java Developers Should Care

Pedro Santos — Wed, 22 Apr 2026 23:00:00 +0000

What is MCP and Why Java Developers Should Care

Every AI tutorial shows you how to build a chatbot. You give it a system prompt, connect it to an LLM, and it talks back. But the moment you need the AI to do something real, like check inventory or query a payment status, you're writing custom HTTP clients and glue code.

MCP (Model Context Protocol) fixes this. It's a standard protocol that lets AI agents discover and call external tools over HTTP. Think of it as USB for AI: one plug, any device. Any agent can connect to any MCP server and use its tools without custom integration code.

This series covers MCP from a Java developer's perspective. Not theory. I'm running 4 MCP servers in production across a microservices system, and I'll show the actual code.

The Problem MCP Solves

I have 5 microservices coordinated via Kafka in a saga pattern. I built an AI agent that needs to query data across all of them: order details from MongoDB, payment status from PostgreSQL, inventory levels from another PostgreSQL, product catalog from a fourth database.

Without MCP, the options are bad. I could write an HTTP client for each service. Four clients, four sets of DTOs, four error handling strategies. Every time a service adds a new query, I update the client. Tight coupling between the agent and every service it talks to.

Or I could use LangChain4j's @Tool annotation. But @Tool only works for methods in the same JVM. My agent lives in a separate service. The inventory data lives in another.

How MCP Works

MCP uses JSON-RPC over HTTP with Server-Sent Events (SSE) for the connection lifecycle. The flow has three phases.

Discovery. The client connects to the server's SSE endpoint. It sends an initialize request and a tools/list request. The server responds with all available tools, including their names, descriptions, and parameter schemas.

Invocation. When the agent decides it needs data, the LLM generates a tool call. The MCP client sends a tools/call request with the tool name and arguments. The server executes the tool and returns the result.

Statelessness. Each tool call is independent. The server doesn't maintain state between calls. The session exists only for the SSE connection lifecycle.

In concrete terms, it looks like this:

Agent → SSE connect to http://localhost:8092/sse → gets sessionId
Agent → POST /mcp/message?sessionId=xxx → {"method": "initialize", ...}
Agent → POST /mcp/message?sessionId=xxx → {"method": "tools/list", ...}
Server responds → ["getStockByProduct", "getLowStockAlert", "checkReservationExists"]
Agent → POST /mcp/message?sessionId=xxx → {"method": "tools/call", "params": {"name": "getStockByProduct", "arguments": {"productCode": "COMIC_BOOKS"}}}
Server responds → {"available": 600}

No SDK needed on the client side. It's HTTP requests with JSON bodies. You could test it with curl.

MCP vs REST APIs

If you squint, MCP looks like a REST API with extra steps. Both expose functionality over HTTP. Both use JSON. So why not just call the REST endpoint directly?

Three reasons.

Tool discovery. REST APIs require documentation. Someone reads the Swagger spec and writes a client. MCP servers advertise their capabilities at runtime. The agent connects and instantly knows what tools are available, what parameters they take, and when to use them (via the description field).

LLM integration. LangChain4j converts MCP tool schemas into functionDeclaration objects that the LLM understands. The LLM reads the tool description, decides when to use it, and generates the correct arguments. You don't write routing logic.

Decoupling. Adding a new tool to a REST API means updating the client. Adding a new tool to an MCP server means the agent sees it on next connection. Zero changes on the agent side.

The Protocol in Detail

MCP uses three JSON-RPC methods.

initialize establishes the connection and negotiates capabilities:

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "clientInfo": { "name": "my-agent", "version": "1.0.0" },
    "capabilities": {}
  }
}

tools/list returns all available tools:

{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/list",
  "params": {}
}

tools/call invokes a specific tool:

{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "getStockByProduct",
    "arguments": { "productCode": "COMIC_BOOKS" }
  }
}

The response always comes back as a CallToolResult with a content array of TextContent objects. Simple and predictable.

The Java Ecosystem

Two libraries handle MCP in Java.

Server side: io.modelcontextprotocol.sdk:mcp:1.1.0. This is the official MCP Java SDK. You create an McpSyncServer with transport, tools, and server info. It handles the JSON-RPC protocol and SSE lifecycle.

Client side: LangChain4j's langchain4j-mcp module. You create McpClient instances pointing to each server's SSE URL. Then you wrap them in an McpToolProvider and pass it to your agent builder.

Both are stable and production-ready in my experience. The server SDK is lightweight (one dependency). The client side is handled by LangChain4j, so if you're already using it for agents, there's nothing extra to install.

What's Next

In the next post, I'll show step by step how to turn a Spring Boot microservice into an MCP server. Real code from my payment-service: defining tools, setting up the transport, and testing it with curl before connecting any AI agent.

The repo: github.com/pedrop3/saga-orchestration

The Orchestrator: State Transitions and Kafka Routing

Pedro Santos — Mon, 20 Apr 2026 23:00:00 +0000

In the previous post, I explained why I chose the Saga Pattern over distributed transactions. Now let's look at the central piece: the orchestrator.

The orchestrator is the brain of the system. It receives events from all services and decides what happens next. It doesn't hold any business logic. It doesn't talk to databases. It just routes messages based on a state transition table.

The State Transition Table

The entire saga flow is defined in a single static array. Each row maps a (source, status) pair to the next Kafka topic:

public final class SagaHandler {

    public static final Object[][] SAGA_HANDLER = {
        { ORCHESTRATOR,                SUCCESS,  PRODUCT_VALIDATION_SUCCESS },
        { ORCHESTRATOR,                FAIL,     FINISH_FAIL },

        { PRODUCT_VALIDATION_SERVICE,  ROLLBACK, PRODUCT_VALIDATION_FAIL },
        { PRODUCT_VALIDATION_SERVICE,  FAIL,     FINISH_FAIL },
        { PRODUCT_VALIDATION_SERVICE,  SUCCESS,  PAYMENT_SUCCESS },

        { PAYMENT_SERVICE,             ROLLBACK, PAYMENT_FAIL },
        { PAYMENT_SERVICE,             FAIL,     PRODUCT_VALIDATION_FAIL },
        { PAYMENT_SERVICE,             SUCCESS,  INVENTORY_SUCCESS },

        { INVENTORY_SERVICE,           ROLLBACK, INVENTORY_FAIL },
        { INVENTORY_SERVICE,           FAIL,     PAYMENT_FAIL },
        { INVENTORY_SERVICE,           SUCCESS,  FINISH_SUCCESS }
    };

    public static final int EVENT_SOURCE_INDEX = 0;
    public static final int SAGA_STATUS_INDEX  = 1;
    public static final int TOPIC_INDEX        = 2;
}

Read it like this: when PAYMENT_SERVICE sends SUCCESS, publish to inventory-success. When INVENTORY_SERVICE sends FAIL, publish to payment-fail (rollback the payment). When INVENTORY_SERVICE sends SUCCESS, publish to finish-success (saga complete).

This table is the entire orchestration logic. Adding a new step means adding rows. Changing the order means reordering rows. No if/else chains. No complex routing code.

Finding the Next Topic

The SagaExecutionController looks up the table on every event:

@Component
public class SagaExecutionController {

    public TopicsEnum getNextTopic(Event event) {
        if (isEmpty(event.getSource()) || isEmpty(event.getStatus())) {
            throw new ValidationException("Source and status must be informed.");
        }
        return findTopicBySourceAndStatus(event);
    }

    private TopicsEnum findTopicBySourceAndStatus(Event event) {
        return (TopicsEnum) Arrays.stream(SAGA_HANDLER)
            .filter(row -> isEventSourceAndStatusValid(event, row))
            .map(i -> i[TOPIC_INDEX])
            .findFirst()
            .orElseThrow(() -> new ValidationException("Topic not found!"));
    }

    private boolean isEventSourceAndStatusValid(Event event, Object[] row) {
        var source = row[EVENT_SOURCE_INDEX];
        var status = row[SAGA_STATUS_INDEX];
        return source.toString().equals(event.getSource())
            && status.equals(event.getStatus());
    }
}

It streams through the table, finds the matching row, and returns the topic. No switch statements. No service-specific logic. Just a lookup.

The Kafka Consumer

The orchestrator listens on multiple topics. Each one triggers a different action:

@KafkaListener(
    groupId = "${spring.kafka.consumer.group-id}",
    topics = "${spring.kafka.topic.start-saga}")
public void consumeStartSagaEvent(String payload) {
    var event = jsonUtil.toEvent(payload).orElseThrow();
    orchestrationService.startSaga(event);
}

@KafkaListener(
    groupId = "${spring.kafka.consumer.group-id}",
    topics = "${spring.kafka.topic.orchestrator}")
public void consumeOrchestratorEvent(String payload) {
    var event = jsonUtil.toEvent(payload).orElseThrow();

    switch (event.getStatus()) {
        case SUCCESS  -> orchestrationService.continueSaga(event);
        case ROLLBACK -> orchestrationService.rollbackSaga(event);
        case FAIL     -> orchestrationService.handleFail(event);
    }
}

Every service publishes back to the orchestrator topic. The orchestrator reads the status and decides the next action. SUCCESS means continue to the next step. ROLLBACK means the current service failed and needs its own compensation first. FAIL means the service already rolled back and now the previous service needs to compensate.

The Orchestration Service

The OrchestrationService ties it all together:

public void startSaga(Event event) {
    event.setSource(ORCHESTRATOR.toString());
    event.setStatus(SUCCESS);
    var topic = getTopic(event);
    addHistory(event, "Saga started!");
    sendToProducerWithTopic(event, topic);
}

public void continueSaga(Event event) {
    var topic = getTopic(event);
    sendToProducerWithTopic(event, topic);
}

public void finishSagaSuccess(Event event) {
    event.setSource(ORCHESTRATOR.toString());
    event.setStatus(SUCCESS);
    addHistory(event, "Saga finished successfully!");
    notifyFinishedSaga(event);
}

public void finishSagaFail(Event event) {
    event.setSource(ORCHESTRATOR.toString());
    event.setStatus(FAIL);
    addHistory(event, "Saga finished with errors!");
    notifyFinishedSaga(event);
}

startSaga looks up the first topic in the table (product-validation-success) and publishes. continueSaga does the same lookup based on whoever just completed their step. finishSagaSuccess and finishSagaFail publish to the notify-ending topic so the order-service can update the final status.

Kafka Topics: Who Publishes What

Every service has clear boundaries. Each one consumes from its own success/fail topics and produces back to the orchestrator topic:

Service	Consumes	Produces
order-service	`notify-ending`	`start-saga`
orchestrator	`start-saga`, `orchestrator`, `finish-success`, `finish-fail`	All service topics + `notify-ending`
product-validation	`product-validation-success`, `product-validation-fail`	`orchestrator`
payment-service	`payment-success`, `payment-fail`	`orchestrator`
inventory-service	`inventory-success`, `inventory-fail`	`orchestrator`

The orchestrator is the only service that publishes to multiple topics. Every other service publishes to exactly one: orchestrator.

Adding History at Every Step

Every time the orchestrator processes an event, it appends a History entry:

private void addHistory(Event event, String message) {
    var history = History.builder()
        .source(event.getSource())
        .status(event.getStatus().toString())
        .message(message)
        .createdAt(LocalDateTime.now())
        .build();
    event.addToHistory(history);
}

By the time a saga ends, the event carries a complete timeline. Every service, every status change, every message. This is what makes debugging possible. You don't need to search through logs. The event itself tells you exactly what happened.

The Stateless Design

Notice that the orchestrator has no database. It doesn't persist saga state between messages. The entire state travels inside the Event object through Kafka.

This is a deliberate choice. The orchestrator can restart at any time without losing state. Kafka retains the messages. The event carries all context. If the orchestrator crashes mid-saga, the unprocessed message is still on the topic and gets picked up after restart.

The downside: you can't query "what sagas are currently in progress" from the orchestrator. That's the order-service's job (it stores all events in MongoDB and the notify-ending consumer updates the final status).

What's Next

The state machine handles the happy path and knows which topics to publish on failure. But what actually happens inside each service when a rollback is triggered? In the next post, I'll walk through the compensation logic: how payment-service refunds a charge and how inventory-service restores stock.

The repo: github.com/pedrop3/saga-orchestration

Why Sagas (and Why Not Distributed Transactions)

Pedro Santos — Thu, 16 Apr 2026 07:00:00 +0000

You have 5 microservices. An order comes in. You need to validate the product, charge the customer, and reserve inventory. If any of those steps fails, you need to undo the ones that already succeeded.

The textbook answer is a distributed transaction with two-phase commit (2PC). Lock all resources across all services, do the work, then commit everything at once. The problem: 2PC doesn't scale. It requires all services to be available simultaneously. One slow database and everything blocks. In a microservices world with Kafka and independent deployments, 2PC is a non-starter.

The alternative is the Saga Pattern. Instead of one big transaction, you run a chain of local transactions. Each service does its work and publishes an event. If a step fails, you run compensating transactions to undo the previous steps. No distributed locks. No two-phase commit. Each service owns its own data and its own rollback logic.

This series walks through how I built a saga orchestrator from scratch with Spring Boot and Kafka. Real code, real failure scenarios, real rollback chains.

Choreography vs Orchestration

There are two ways to implement sagas.

Choreography means each service listens for events and decides what to do next. Order service publishes "order created." Payment service picks it up and charges the card. Inventory service picks up "payment completed" and reserves stock. No central coordinator.

The problem with choreography is that nobody owns the flow. When you have 5 services and 3 failure modes each, the event chain becomes hard to follow. Debugging a failed saga means reading logs across all services and reconstructing the sequence yourself.

Orchestration means a central service controls the flow. It tells each service what to do and when. It knows which step comes next and which service to call for rollback. The saga logic lives in one place.

I went with orchestration. The tradeoff is that you get a single point of coordination (the orchestrator), but in return you get a clear state machine that's easy to debug and easy to extend.

The Architecture

My system has 5 services, each with its own database:

Service	Port	Database	Role
order-service	3000	MongoDB	Creates orders, stores saga events
orchestrator	8050	(stateless)	Controls the saga flow
product-validation	8090	PostgreSQL	Validates product catalog
payment-service	8091	PostgreSQL	Processes payments
inventory-service	8092	PostgreSQL	Manages stock

All communication goes through Kafka. The orchestrator publishes to service-specific topics. Each service does its work and publishes back to the orchestrator topic.

The Happy Path

When everything works, the flow looks like this:

Order Service → Orchestrator → Product Validation ✅ → Payment ✅ → Inventory ✅ → Finish

A user creates an order via REST API on the order-service
Order-service saves the order to MongoDB and publishes to start-saga
Orchestrator picks it up and publishes to product-validation-success
Product validation checks the catalog, publishes SUCCESS back to orchestrator
Orchestrator publishes to payment-success
Payment processes the charge, publishes SUCCESS back to orchestrator
Orchestrator publishes to inventory-success
Inventory reserves stock, publishes SUCCESS back to orchestrator
Orchestrator publishes to finish-success and then notify-ending

Every step is a Kafka message. Every transition is logged. The order-service listens on notify-ending to update the final status.

The Sad Path: Compensating Transactions

When payment fails (card declined, fraud blocked, amount too high), the orchestrator needs to undo product validation. When inventory fails (out of stock), it needs to undo both payment and product validation.

The rule is simple: on failure, roll back in reverse order. If step 3 fails, compensate steps 2 and 1.

Payment FAIL → publish to payment-fail → Payment refunds
            → publish to product-validation-fail → Validation marks as failed
            → publish to finish-fail → Saga ends with FAIL status

Each service implements two operations: the forward action and the compensation. The payment-service has realizePayment() and realizeRefund(). The inventory-service has updateInventory() and rollbackInventory().

Creating an Order (the Starting Point)

Here's the actual REST endpoint that kicks off a saga:

@PostMapping
public ResponseEntity<Order> createOrder(@Valid @RequestBody OrderRequest orderRequest) {
    Order createdOrder = orderService.createOrder(orderRequest);
    return ResponseEntity.status(HttpStatus.CREATED).body(createdOrder);
}

The OrderService saves the order, creates an event, and publishes to Kafka:

@Transactional
public OrderDocument createOrder(OrderRequest orderRequest) {
    var orderDocument = saveOrder(orderRequest);
    var eventDocument = createEventPayload(orderDocument);
    eventPublisherService.publish(eventDocument);
    return orderDocument;
}

The EventPublisherService serializes the event and sends it to the start-saga topic:

public void publish(EventDocument eventDocument) {
    eventService.save(eventDocument);
    sagaProducer.sendEvent(serializeEvent(eventDocument));
}

From this point, the orchestrator takes over. The order-service doesn't know or care about product validation, payment, or inventory. It just publishes an event and waits for the final notification.

The Event Structure

Every message in the system follows the same Event structure:

public class Event {
    protected String eventId;
    private String transactionId;
    private String orderId;
    private Order order;
    private String source;
    private SagaStatusEnum status;        // SUCCESS, ROLLBACK, FAIL
    private List<History> eventHistory;
    private LocalDateTime createdAt;
}

The eventHistory list is key. Every service appends its result to this list. By the time the saga ends, you have a complete audit trail of what happened at each step, who did it, and when.

public void addToHistory(History history) {
    if (eventHistory == null) {
        eventHistory = new ArrayList<>();
    }
    eventHistory.add(history);
}

What's Next

In the next post, I'll show the orchestrator itself: the state transition table that maps (source, status) to the next Kafka topic, the consumer that routes events, and how the whole thing stays deterministic even with concurrent sagas.

The repo is open source: github.com/pedrop3/saga-orchestration

Part 3 - Agents That Diagnose, Plan, and Query a Distributed Saga

Pedro Santos — Mon, 13 Apr 2026 23:00:00 +0000

In the previous posts, I set up LangChain4j and connected AI agents to 5 microservices via MCP. The plumbing was done. Now for the actual agents, the part that made me rethink how I approach operations in distributed systems.

I built 3 agents, each with a different trigger and a different job. None of them are chatbots. They’re background workers and query interfaces that use LLMs to reason over real system data.

Agent 1: OperationsAgent (Auto-Diagnosis on Failure)

Trigger: Kafka consumer on notify-ending topic (only when status = FAIL)
Job: Figure out why a saga failed, find similar past incidents, write a diagnostic report
Storage: pgvector (embeddings) + PostgreSQL (diagnostics table)

This was the first agent I built, and it’s the one that surprised me the most.

How It Works

Every saga, whether it succeeds or fails, ends with a notify-ending event on Kafka. My agent listens to that topic:

@KafkaListener(
    topics = "${spring.kafka.topic.notify-ending}",
    groupId = "ai-agent-group")
public void onSagaEnded(String payload) {
    Event event = objectMapper.readValue(payload, Event.class);

    // Vectorize ALL events, builds the historical base
    String historyText = buildHistoryText(event);
    vectorize(event, historyText);

    // Diagnose only failures
    if (event.getStatus() == FAIL) {
        diagnose(event, historyText);
    }
}

Two things happen here. First, every event gets vectorized, converted to an embedding and stored in pgvector. This builds up a knowledge base over time. Second, failures get diagnosed.

The RAG Pipeline

The diagnosis uses RAG. Before asking the LLM anything, I search for similar past incidents:

private String findSimilarIncidents(String historyText) {
    var queryEmbedding = embeddingModel.embed(historyText).content();
    var results = embeddingStore.search(
        EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(3)
            .minScore(0.75)
            .build());

    if (results.matches().isEmpty())
        return "No similar incidents found in history.";

    return results.matches().stream()
        .map(m -> "--- Similar incident (score=" +
            String.format("%.2f", m.score()) + ") ---\n" + m.embedded().text())
        .collect(Collectors.joining("\n\n"));
}

The embedding model is Ollama’s nomic-embed-text. Runs locally and costs nothing. The vector store is pgvector on PostgreSQL. Nothing exotic.

Then I build a prompt with the saga history + RAG context and pass it to the agent:

private void diagnose(Event event, String historyText) {
    String ragContext = findSimilarIncidents(historyText);
    String prompt = """
        SAGA FAILED, DIAGNOSE
        OrderId: %s | TransactionId: %s
        Final status: %s | Total amount: R$ %.2f

        SAGA HISTORY:
        %s

        SIMILAR INCIDENTS (RAG):
        %s
        """.formatted(
            event.getOrderId(), event.getTransactionId(),
            event.getStatus(), totalAmount, historyText, ragContext);

    String diagnosis = operationsAgent.analyze(prompt);

    diagnosticRepository.save(SagaDiagnostic.builder()
        .orderId(event.getOrderId())
        .diagnosis(diagnosis)
        .createdAt(LocalDateTime.now())
        .build());
}

The Agent Definition

The agent itself is minimal, just a system prompt defining the output format:

public interface OperationsAgent {

    @SystemMessage("""
        You are a failure diagnosis specialist for distributed sagas.
        You receive the full history of a FAIL saga and similar past incidents.

        Required format:
        ROOT CAUSE: <service and reason>
        AFFECTED SERVICES: <list>
        FINANCIAL IMPACT: <based on totalAmount>
        HISTORICAL PATTERN: <if RAG found similar cases>
        RECOMMENDATION: <corrective action>

        Rules:
        1. Only use the provided context, never invent data.
        2. If no similar incidents found, say so.
        3. Be concise, consumed by a monitoring system.
        """)
    String analyze(@UserMessage String context);
}

No tools here. The OperationsAgent doesn’t need to query anything. All the data arrives via the Kafka event + RAG. It just needs to reason over the context and produce a structured report.

What It Catches

After running this for a while, it started finding patterns I hadn’t noticed. Payment failures from new customers during late hours. Inventory rollbacks always hitting the same product. Fraud scores spiking for a specific order amount range. The RAG context gets better as more events accumulate. The agent learns from your system’s history.

Agent 2: SagaComposerAgent (Dynamic Saga Planning)

Trigger: Scheduled, every 60 seconds in dev, every 30 minutes in production
Job: Decide the optimal execution order for each customer profile
Storage: Redis with TTL (saga-plan:{profile})

This is the weird one. Instead of hardcoding the saga step order, I let the AI decide it based on actual failure data and system metrics.

The Idea

My saga has a default order: Product Validation → Payment → Inventory. If Payment is failing 40% of the time, it’d be smarter to run it first. Fail fast, avoid unnecessary validation calls.

Same logic applies to fraud. A “new customer + high value order” profile with a 30% fraud block rate probably needs a Fraud Validation step before Payment.

How It Works

Every minute, the agent runs for each customer profile:

@Scheduled(fixedDelayString = "${saga.composer.interval:60000}")
public void recomputePlans() {
    for (String profile : profiles) {
        String ragContext = findHistoricalPatterns(profile);
        String metrics = queryMetrics(dataAnalystAgent);
        String stockAlerts = queryStockAlerts(dataAnalystAgent);

        String prompt = buildCompositionPrompt(profile, metrics, stockAlerts, ragContext);
        String planJson = sagaComposerAgent.compose(prompt);

        redis.opsForValue().set("saga-plan:" + profile, planJson, 35, MINUTES);
    }
}

Notice something: the SagaComposerAgent uses the DataAnalystAgent to get current metrics. Agents calling agents.

The Agent Definition

The system prompt is very specific about the output format and decision rules:

public interface SagaComposerAgent {

    @SystemMessage("""
        You are a saga plan architect. Respond ONLY with raw JSON.
        First character MUST be '{', last MUST be '}'.

        Response format:
        {
          "steps": ["PRODUCT_VALIDATION", "FRAUD_VALIDATION", "PAYMENT", "INVENTORY"],
          "reasoning": "reason for the chosen order"
        }

        Decision rules:
        1. Place high-failure services earlier to fail fast.
        2. If INVENTORY failure rate > 30%, place before PAYMENT.
        3. Include FRAUD_VALIDATION for new + high-value or high fraud rate.
        4. Skip FRAUD_VALIDATION for VIP with < 5% fraud and long positive history.
        5. If data is insufficient, use default order.
        """)
    String compose(@UserMessage String profileContext);
}

The Orchestrator Reads the Plan

On the orchestrator side, when a saga starts, it checks Redis:

public String getFirstTopicForOrder(Order order) {
    String profile = classifyProfile(order);  // e.g., "new:high-value"
    String json = redis.opsForValue().get("saga-plan:" + profile);

    if (json == null) return DEFAULT_FIRST_TOPIC;  // fallback

    var steps = objectMapper.readTree(json).get("steps");
    return resolveTopicFromStep(steps.get(0).asText());
}

If Redis has a plan, use it. If not, fall back to the default order. Redis could be down. The plan could be expired. The agent might not have run yet. Doesn’t matter. The AI layer is additive. It never breaks the existing flow.

Example Output

For a new:high-value profile with recent payment failures:

{
  "steps": ["PRODUCT_VALIDATION", "FRAUD_VALIDATION", "PAYMENT", "INVENTORY"],
  "reasoning": "New high-value customer profile. Historical fraud rate 18% warrants early fraud check. Payment placed after fraud validation to avoid unnecessary payment attempts on blocked orders."
}

For a vip:any profile with clean history:

{
  "steps": ["PRODUCT_VALIDATION", "PAYMENT", "INVENTORY"],
  "reasoning": "FRAUD_SKIP_REASON: VIP customer with 2% fraud rate and 98% success rate over 47 orders. No night order patterns detected."
}

Agent 3: DataAnalystAgent (Natural Language Queries)

Trigger: HTTP GET request to /api/agent/chat?question=...
Job: Answer operational questions by querying all microservices via MCP tools
Output: Human-readable analysis

This is the agent that uses MCP most heavily. It connects to all 4 microservices and has 12+ tools available.

The Agent Definition

public interface DataAnalystAgent {

    @SystemMessage("""
        You are a data analyst for distributed sagas. Answer using exclusively
        the available MCP tools. Never invent data.

        Workflow for finding failed sagas:
        1. Extract N from the question, default to 5.
        2. Call listRecentEvents(limit = N + 10) to get enough FAIL events.
        3. Filter where status=FAIL, take only the first N.
        4. For each failed saga:
           a. Call getOrderById(orderId) to get clientType, totalAmount.
           b. Extract hourOfDay from the event timestamp.
           c. Call getFraudRiskScore with the order data.
        5. Report only the N requested sagas.
        """)
    String analyze(@UserMessage String question);
}

The Critical Lesson: Workflow Instructions Beat Tool Descriptions

Look at the ## Workflow section in the system prompt. That’s the most important thing I learned building these agents.

At first, I just described the tools and let the model figure out the workflow. It worked… sometimes. Other times it would call tools in the wrong order, forget to filter by FAIL status, or process 15 sagas when I asked for 5.

Once I wrote explicit step-by-step instructions in the system prompt, the reliability jumped. I told the agent HOW to use the tools, not just WHAT they do. The model still decides which tools to call, but it follows the prescribed workflow.

Example Interaction

Question: “List the 5 most recent failed sagas and assess their fraud risk.”

The agent:

Calls listRecentEvents(limit=15) on order-service
Filters for FAIL status, takes first 5
For each: calls getOrderById() then getFraudRiskScore()
Returns a structured report:

Order 69b6c29f → Payment rejected (R$932.80 > R$500 limit). Fraud score: 12/100 APPROVED. No compensation needed, payment was never processed.

Lessons Learned (the Hard Way)

After building all 3 agents, here are the things I wish someone had told me:

1. MCP beats @Tool for microservices. Not even close. The decoupling alone is worth it. Any agent can connect to any service without code changes.

2. SystemMessage alignment is critical. If your system prompt mentions a tool that doesn’t exist, the agent fails silently. It tries to call it, gets no result, and gives a vague answer. I spent hours debugging this before I realized the prompt referenced getTransactionStatus but the tool was actually named getPaymentStatus.

3. JSON responses from tools win over key=value. I started with "status=SUCCESS | amount=150.00" and switched to ObjectMapper.writeValueAsString(). One line of code, zero parsing bugs on the model side.

4. maxOutputTokens matters more than you think. I set it to 1024 initially. Asking for 5 sagas + fraud scores was consistently truncated. Bumped it to 4096 and the problem disappeared.

5. Virtual threads are not optional. When an agent calls 5 MCP tools, those are HTTP calls. Without virtual threads, they’re sequential and slow. One line in application.yml:

spring:
  threads:
    virtual:
      enabled: true

Parallel MCP calls at zero cost.

6. The AI layer should always be additive. Every piece of AI in my system has a fallback. Redis plan not found? Use default saga order. Diagnosis fails? The saga still completes normally. The AI improves operations but never blocks them.

The Full Picture

Here’s what the system looks like with all 3 agents running.

An order comes in. The orchestrator checks Redis for an AI-generated plan and executes the saga. If the saga fails, the OperationsAgent diagnoses the failure using RAG and saves it to the database. Every minute, the SagaComposerAgent reads metrics and failure patterns, then writes new plans to Redis. And anytime, a developer can ask the DataAnalystAgent “why are payments failing for new customers?” and get a grounded answer.

The agents feed each other. The OperationsAgent’s vectorized events improve the SagaComposerAgent’s RAG context. The DataAnalystAgent’s metrics help the SagaComposerAgent make better plans. It’s a flywheel.

Try It Yourself

The entire project is open source with setup instructions:

Repo: github.com/pedrop3/saga-orchestration
LangChain4j: langchain4j.dev

You’ll need Docker Compose for the infrastructure (Kafka, PostgreSQL, Redis, MongoDB). You’ll also need Ollama for local embeddings and a Gemini API key (free tier works fine for testing).

The README has step-by-step instructions and a pre-configured Bruno collection with all the API requests.

If you have questions or find issues, open an issue on the repo or drop a comment here. I’m still iterating on the system prompts. They’re never really “done.”

This is part 3 of a 3-part series on integrating AI into a distributed saga system:

Part 1 - Why I Picked LangChain4j Over Spring AI
Part 2 - Connecting AI Agents to Microservices with MCP
Part 3 - Agents That Diagnose, Plan, and Query a Distributed Saga ← you are here

Part 2 - Connecting AI Agents to Microservices with MCP

Pedro Santos — Tue, 07 Apr 2026 18:32:17 +0000

Connecting AI Agents to Microservices with MCP (No Custom SDKs)

In the previous post, I showed how LangChain4j lets you build agents with a Java interface and a couple of annotations. But those agents were using @Tool, methods defined in the same JVM. Fine for a monolith, but I’m running 5 microservices.

I needed the AI agent in service A to call business logic in service B, C, D, and E. Without writing bespoke HTTP clients for each one.

That’s where MCP comes in, and it changed how I think about exposing business logic.

The Problem: @Tool Doesn’t Scale Across Services

In my saga orchestration system, I have:

order-service (port 3000): MongoDB, manages orders and events
product-validation-service (port 8090): PostgreSQL, validates catalog
payment-service (port 8091): PostgreSQL, handles payments and fraud scoring
inventory-service (port 8092): PostgreSQL, manages stock
orchestrator (port 8050): coordinates the saga via Kafka

And then there’s the ai-saga-agent (port 8099), the service that hosts my AI agents. It needs to query data from ALL other services.

With @Tool, I’d have to write HTTP clients and DTOs for each service. Error handling, retry logic, the whole nine yards. Every time a service adds a new capability, I’d update the agent’s code. Tight coupling everywhere.

MCP: One Protocol for Everything

MCP (Model Context Protocol) is basically USB for AI. Instead of writing custom integrations per service, you expose tools via a standard JSON-RPC protocol over HTTP/SSE. Any agent can connect, discover available tools, and call them.

The before/after in my codebase was dramatic.

Before (without MCP): Agent needs stock data, write InventoryHttpClient. Agent needs payment status, write PaymentHttpClient. Agent needs order details, write OrderHttpClient. New tool in inventory? Update the client, update the agent.

After (with MCP): Each service exposes an MCP server. Agent connects to http://localhost:8092/sse and automatically discovers getStockByProduct, getLowStockAlert, checkReservationExists. New tool? Just add it to the MCP server. The agent sees it on next connection.

Making a Microservice an MCP Server

Let me show you the actual code from my payment-service. It already had a PaymentService and a FraudValidationService, real business logic with database queries. I just needed to expose some of those methods as MCP tools.

Add the Dependency

implementation 'io.modelcontextprotocol.sdk:mcp:0.9.0'

Set Up the Transport

@Bean
public HttpServletSseServerTransportProvider mcpTransport() {
    return HttpServletSseServerTransportProvider.builder()
        .objectMapper(new ObjectMapper())
        .messageEndpoint("/mcp/message")
        .build();
}

@Bean
public ServletRegistrationBean<HttpServletSseServerTransportProvider> mcpServlet(
        HttpServletSseServerTransportProvider transport) {
    return new ServletRegistrationBean<>(transport, "/sse", "/mcp/message");
}

Register Your Tools

Here’s the key part. I’m reusing the same PaymentService and FraudValidationService beans that already exist:

@Bean
public McpSyncServer mcpServer(
        HttpServletSseServerTransportProvider transport,
        PaymentService paymentService,
        FraudValidationService fraudService) {

    return McpServer.sync(transport)
        .serverInfo("payment-mcp", "1.0.0")
        .capabilities(ServerCapabilities.builder().tools(true).build())
        .tools(
            getPaymentStatus(paymentService),
            getRefundRate(paymentService),
            getFraudRiskScore(fraudService)  // same business logic, now via MCP
        )
        .build();
}

Each tool needs four things. A name and description so the LLM understands what it does. A JSON schema for parameters. And a handler function that runs your actual business logic:

private SyncToolSpecification getPaymentStatus(PaymentService paymentService) {
    return tool(
        "getPaymentStatus",
        "Returns the current payment status for a given transaction. " +
        "Use to verify whether a payment was processed, pending, or refunded.",
        """
        {
          "type": "object",
          "properties": {
            "transactionId": {
              "type": "string",
              "description": "Transaction ID associated with the saga"
            }
          },
          "required": ["transactionId"]
        }
        """,
        args -> {
            String txId = (String) args.get("transactionId");
            return paymentService.findByTransactionId(txId)
                .map(p -> "status=" + p.getStatus()
                    + " | totalAmount=" + p.getTotalAmount()
                    + " | totalItems=" + p.getTotalItems())
                .orElse("No payment found for transactionId=" + txId);
        }
    );
}

Notice: no new code. The paymentService.findByTransactionId() method already existed. I’m just wrapping it with a description so the LLM knows when to call it.

What Each Service Exposes

I did this for all 4 services:

Service	MCP Tools
order-service	`getOrderById`, `listRecentEvents`, `getLastEventByOrder`
payment-service	`getPaymentStatus`, `getRefundRate`, `getFraudRiskScore`
inventory-service	`getStockByProduct`, `getLowStockAlert`, `checkReservationExists`
product-validation	`checkProductExists`, `checkValidationExists`, `listCatalog`

Each service keeps full ownership of its data. The MCP layer is just a thin exposure.

The Agent Side: Connecting as an MCP Client

Now on the ai-saga-agent, I connect to all these servers:

@Bean
public McpToolProvider mcpToolProvider() {
    return McpToolProvider.builder()
        .mcpClients(List.of(
            buildClient("http://localhost:3000/sse"),     // order
            buildClient("http://localhost:8091/sse"),     // payment
            buildClient("http://localhost:8092/sse"),     // inventory
            buildClient("http://localhost:8090/sse")      // product-validation
        ))
        .build();
}

private McpClient buildClient(String sseUrl) {
    return new DefaultMcpClient.Builder()
        .transport(new HttpMcpTransport.Builder()
            .sseUrl(sseUrl)
            .build())
        .build();
}

Then when I build an agent, I just pass the mcpToolProvider:

DataAnalystAgent agent = AiServices.builder(DataAnalystAgent.class)
    .chatModel(gemini)
    .toolProvider(mcpToolProvider)   // discovers tools from all 4 services
    .build();

That’s it. The agent now has access to 12+ tools across 4 services, without a single HTTP client written by hand.

The Saga Architecture (Quick Context)

For those not familiar with the Saga Pattern: it’s how you handle distributed transactions without two-phase commit. Instead of one big transaction, you have a chain of local transactions. If any step fails, you run compensating transactions to undo the previous steps.

My flow looks like this:

Order Service → Orchestrator → Product Validation → Payment → Inventory → Success
                                    ↑                  ↑          ↑
                                    └──── Rollback ←───┴──────────┘

Everything communicates via Kafka topics. The orchestrator listens for results and decides what to publish next. There’s a state transition table that maps (source, status) to the next topic:

Source	Status	→ Next Topic
ORCHESTRATOR	SUCCESS	product-validation-success
PRODUCT_VALIDATION	SUCCESS	payment-success
PAYMENT	SUCCESS	inventory-success
INVENTORY	SUCCESS	finish-success
INVENTORY	FAIL	payment-fail (rollback)
PAYMENT	FAIL	product-validation-fail (rollback)

The beauty of this setup is that the saga flow is deterministic and auditable. Every event is stored, every transition is logged.

@Tool vs MCP Tool: When to Use Each

After building this, my rule of thumb is simple:

Use @Tool when the logic lives in the same JVM as the agent. No network overhead, tightly coupled, only that agent can use it.

Use MCP when the logic lives in another service. Any agent can connect. The protocol is language-agnostic (just JSON-RPC), and adding new tools doesn’t require changes on the agent side.

In practice, my agents use MCP for everything. The only @Tool I still use is for utility functions that don’t belong in any microservice, like formatting helpers or date calculations.

Testing MCP Endpoints Manually

You can test MCP without an AI agent. It’s just HTTP:

# 1. Open an SSE session
curl http://localhost:8092/sse
# Returns a sessionId

# 2. List available tools
curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

# 3. Call a tool
curl -X POST "http://localhost:8092/mcp/message?sessionId=YOUR_SESSION" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc":"2.0","id":3,"method":"tools/call",
    "params":{"name":"getStockByProduct","arguments":{"productCode":"COMIC_BOOKS"}}
  }'

This is super useful for debugging. When an agent does something unexpected, I test the tool directly to check if it’s the tool or the prompt that’s wrong.

What’s Next

With MCP in place, the infrastructure was ready. But the interesting part is what the agents actually do with all these tools. In the next post, I’ll walk through the 3 agents I built. The OperationsAgent listens for failed sagas on Kafka and auto-diagnoses them using RAG. The SagaComposerAgent periodically rewrites the saga execution plan based on real failure data. And the DataAnalystAgent answers natural language questions like “list the 5 most recent failed sagas and assess their fraud risk.”

The code is all open source: github.com/pedrop3/sagaorchestration

This is part 2 of a 3-part series on integrating AI into a distributed saga system:

Part 1 - Why I Picked LangChain4j Over Spring AI
Part 2 - Connecting AI Agents to Microservices with MCP
Part - Agents That Diagnose, Plan, and Query a Distributed Saga