<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pedro Santos</title>
    <description>The latest articles on DEV Community by Pedro Santos (@pedrop3).</description>
    <link>https://dev.to/pedrop3</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1268016%2Fb517fbe5-d569-4937-8a6d-66e996935ede.jpeg</url>
      <title>DEV Community: Pedro Santos</title>
      <link>https://dev.to/pedrop3</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pedrop3"/>
    <language>en</language>
    <item>
      <title>Part 3 - Agents That Diagnose, Plan, and Query a Distributed Saga</title>
      <dc:creator>Pedro Santos</dc:creator>
      <pubDate>Mon, 13 Apr 2026 23:00:00 +0000</pubDate>
      <link>https://dev.to/pedrop3/part-3-agents-that-diagnose-plan-and-query-a-distributed-saga-51e3</link>
      <guid>https://dev.to/pedrop3/part-3-agents-that-diagnose-plan-and-query-a-distributed-saga-51e3</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/pedrop3/part-2-connecting-ai-agents-to-microservices-with-mcp-16m4"&gt;previous posts&lt;/a&gt;, I set up LangChain4j and connected AI agents to 5 microservices via MCP. The plumbing was done. Now for the actual agents, the part that made me rethink how I approach operations in distributed systems.&lt;/p&gt;

&lt;p&gt;I built 3 agents, each with a different trigger and a different job. None of them are chatbots. They’re background workers and query interfaces that use LLMs to reason over real system data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent 1: OperationsAgent (Auto-Diagnosis on Failure)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; Kafka consumer on &lt;code&gt;notify-ending&lt;/code&gt; topic (only when status = FAIL)&lt;br&gt;
&lt;strong&gt;Job:&lt;/strong&gt; Figure out why a saga failed, find similar past incidents, write a diagnostic report&lt;br&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; pgvector (embeddings) + PostgreSQL (diagnostics table)&lt;/p&gt;

&lt;p&gt;This was the first agent I built, and it’s the one that surprised me the most.&lt;/p&gt;
&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Every saga, whether it succeeds or fails, ends with a &lt;code&gt;notify-ending&lt;/code&gt; event on Kafka. My agent listens to that topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@KafkaListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${spring.kafka.topic.notify-ending}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;groupId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ai-agent-group"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;onSagaEnded&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Event&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Vectorize ALL events, builds the historical base&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buildHistoryText&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;vectorize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Diagnose only failures&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStatus&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;FAIL&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;diagnose&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things happen here. First, every event gets vectorized, converted to an embedding and stored in pgvector. This builds up a knowledge base over time. Second, failures get diagnosed.&lt;/p&gt;

&lt;h3&gt;
  
  
  The RAG Pipeline
&lt;/h3&gt;

&lt;p&gt;The diagnosis uses RAG. Before asking the LLM anything, I search for similar past incidents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;findSimilarIncidents&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;queryEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;search&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;EmbeddingSearchRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryEmbedding&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queryEmbedding&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxResults&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;minScore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"No similar incidents found in history."&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"--- Similar incident (score="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%.2f"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;") ---\n"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;embedded&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;collect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Collectors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;joining&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\n\n"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedding model is &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;’s &lt;code&gt;nomic-embed-text&lt;/code&gt;. Runs locally and costs nothing. The vector store is pgvector on PostgreSQL. Nothing exotic.&lt;/p&gt;

&lt;p&gt;Then I build a prompt with the saga history + RAG context and pass it to the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;diagnose&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Event&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;ragContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;findSimilarIncidents&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""
        SAGA FAILED, DIAGNOSE
        OrderId: %s | TransactionId: %s
        Final status: %s | Total amount: R$ %.2f

        SAGA HISTORY:
        %s

        SIMILAR INCIDENTS (RAG):
        %s
        """&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;formatted&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getOrderId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTransactionId&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStatus&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;totalAmount&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;historyText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ragContext&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;diagnosis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;operationsAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;analyze&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;diagnosticRepository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SagaDiagnostic&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getOrderId&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;diagnosis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;diagnosis&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;now&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Agent Definition
&lt;/h3&gt;

&lt;p&gt;The agent itself is minimal, just a system prompt defining the output format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;OperationsAgent&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@SystemMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        You are a failure diagnosis specialist for distributed sagas.
        You receive the full history of a FAIL saga and similar past incidents.

        Required format:
        ROOT CAUSE: &amp;lt;service and reason&amp;gt;
        AFFECTED SERVICES: &amp;lt;list&amp;gt;
        FINANCIAL IMPACT: &amp;lt;based on totalAmount&amp;gt;
        HISTORICAL PATTERN: &amp;lt;if RAG found similar cases&amp;gt;
        RECOMMENDATION: &amp;lt;corrective action&amp;gt;

        Rules:
        1. Only use the provided context, never invent data.
        2. If no similar incidents found, say so.
        3. Be concise, consumed by a monitoring system.
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@UserMessage&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No tools here. The OperationsAgent doesn’t need to query anything. All the data arrives via the Kafka event + RAG. It just needs to reason over the context and produce a structured report.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Catches
&lt;/h3&gt;

&lt;p&gt;After running this for a while, it started finding patterns I hadn’t noticed. Payment failures from new customers during late hours. Inventory rollbacks always hitting the same product. Fraud scores spiking for a specific order amount range. The RAG context gets better as more events accumulate. The agent learns from your system’s history.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent 2: SagaComposerAgent (Dynamic Saga Planning)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; Scheduled, every 60 seconds in dev, every 30 minutes in production&lt;br&gt;
&lt;strong&gt;Job:&lt;/strong&gt; Decide the optimal execution order for each customer profile&lt;br&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; Redis with TTL (&lt;code&gt;saga-plan:{profile}&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;This is the weird one. Instead of hardcoding the saga step order, I let the AI decide it based on actual failure data and system metrics.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Idea
&lt;/h3&gt;

&lt;p&gt;My saga has a default order: Product Validation → Payment → Inventory. If Payment is failing 40% of the time, it’d be smarter to run it first. Fail fast, avoid unnecessary validation calls.&lt;/p&gt;

&lt;p&gt;Same logic applies to fraud. A “new customer + high value order” profile with a 30% fraud block rate probably needs a Fraud Validation step before Payment.&lt;/p&gt;
&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Every minute, the agent runs for each customer profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Scheduled&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixedDelayString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"${saga.composer.interval:60000}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;recomputePlans&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profiles&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;ragContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;findHistoricalPatterns&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queryMetrics&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataAnalystAgent&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;stockAlerts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;queryStockAlerts&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataAnalystAgent&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;buildCompositionPrompt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stockAlerts&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ragContext&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;planJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagaComposerAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;opsForValue&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"saga-plan:"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;planJson&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="no"&gt;MINUTES&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice something: the SagaComposerAgent uses the DataAnalystAgent to get current metrics. Agents calling agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Agent Definition
&lt;/h3&gt;

&lt;p&gt;The system prompt is very specific about the output format and decision rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;SagaComposerAgent&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@SystemMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        You are a saga plan architect. Respond ONLY with raw JSON.
        First character MUST be '{', last MUST be '}'.

        Response format:
        {
          "steps": ["PRODUCT_VALIDATION", "FRAUD_VALIDATION", "PAYMENT", "INVENTORY"],
          "reasoning": "reason for the chosen order"
        }

        Decision rules:
        1. Place high-failure services earlier to fail fast.
        2. If INVENTORY failure rate &amp;gt; 30%, place before PAYMENT.
        3. Include FRAUD_VALIDATION for new + high-value or high fraud rate.
        4. Skip FRAUD_VALIDATION for VIP with &amp;lt; 5% fraud and long positive history.
        5. If data is insufficient, use default order.
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;compose&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@UserMessage&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;profileContext&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Orchestrator Reads the Plan
&lt;/h3&gt;

&lt;p&gt;On the orchestrator side, when a saga starts, it checks Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getFirstTopicForOrder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Order&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifyProfile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// e.g., "new:high-value"&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;opsForValue&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"saga-plan:"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;DEFAULT_FIRST_TOPIC&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// fallback&lt;/span&gt;

    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readTree&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"steps"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;resolveTopicFromStep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;asText&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Redis has a plan, use it. If not, fall back to the default order. Redis could be down. The plan could be expired. The agent might not have run yet. Doesn’t matter. The AI layer is additive. It never breaks the existing flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Output
&lt;/h3&gt;

&lt;p&gt;For a &lt;code&gt;new:high-value&lt;/code&gt; profile with recent payment failures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"PRODUCT_VALIDATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FRAUD_VALIDATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PAYMENT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INVENTORY"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"New high-value customer profile. Historical fraud rate 18% warrants early fraud check. Payment placed after fraud validation to avoid unnecessary payment attempts on blocked orders."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a &lt;code&gt;vip:any&lt;/code&gt; profile with clean history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"PRODUCT_VALIDATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PAYMENT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INVENTORY"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"FRAUD_SKIP_REASON: VIP customer with 2% fraud rate and 98% success rate over 47 orders. No night order patterns detected."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Agent 3: DataAnalystAgent (Natural Language Queries)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Trigger:&lt;/strong&gt; HTTP GET request to &lt;code&gt;/api/agent/chat?question=...&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Job:&lt;/strong&gt; Answer operational questions by querying all microservices via MCP tools&lt;br&gt;
&lt;strong&gt;Output:&lt;/strong&gt; Human-readable analysis&lt;/p&gt;

&lt;p&gt;This is the agent that uses MCP most heavily. It connects to all 4 microservices and has 12+ tools available.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Agent Definition
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@SystemMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        You are a data analyst for distributed sagas. Answer using exclusively
        the available MCP tools. Never invent data.

        Workflow for finding failed sagas:
        1. Extract N from the question, default to 5.
        2. Call listRecentEvents(limit = N + 10) to get enough FAIL events.
        3. Filter where status=FAIL, take only the first N.
        4. For each failed saga:
           a. Call getOrderById(orderId) to get clientType, totalAmount.
           b. Extract hourOfDay from the event timestamp.
           c. Call getFraudRiskScore with the order data.
        5. Report only the N requested sagas.
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@UserMessage&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The Critical Lesson: Workflow Instructions Beat Tool Descriptions
&lt;/h3&gt;

&lt;p&gt;Look at the &lt;code&gt;## Workflow&lt;/code&gt; section in the system prompt. That’s the most important thing I learned building these agents.&lt;/p&gt;

&lt;p&gt;At first, I just described the tools and let the model figure out the workflow. It worked… sometimes. Other times it would call tools in the wrong order, forget to filter by FAIL status, or process 15 sagas when I asked for 5.&lt;/p&gt;

&lt;p&gt;Once I wrote explicit step-by-step instructions in the system prompt, the reliability jumped. I told the agent HOW to use the tools, not just WHAT they do. The model still decides which tools to call, but it follows the prescribed workflow.&lt;/p&gt;
&lt;h3&gt;
  
  
  Example Interaction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; “List the 5 most recent failed sagas and assess their fraud risk.”&lt;/p&gt;

&lt;p&gt;The agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Calls &lt;code&gt;listRecentEvents(limit=15)&lt;/code&gt; on order-service&lt;/li&gt;
&lt;li&gt;Filters for FAIL status, takes first 5&lt;/li&gt;
&lt;li&gt;For each: calls &lt;code&gt;getOrderById()&lt;/code&gt; then &lt;code&gt;getFraudRiskScore()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Returns a structured report:&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Order 69b6c29f → Payment rejected (R$932.80 &amp;gt; R$500 limit). Fraud score: 12/100 APPROVED. No compensation needed, payment was never processed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Lessons Learned (the Hard Way)
&lt;/h2&gt;

&lt;p&gt;After building all 3 agents, here are the things I wish someone had told me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. MCP beats @Tool for microservices.&lt;/strong&gt; Not even close. The decoupling alone is worth it. Any agent can connect to any service without code changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. SystemMessage alignment is critical.&lt;/strong&gt; If your system prompt mentions a tool that doesn’t exist, the agent fails silently. It tries to call it, gets no result, and gives a vague answer. I spent hours debugging this before I realized the prompt referenced &lt;code&gt;getTransactionStatus&lt;/code&gt; but the tool was actually named &lt;code&gt;getPaymentStatus&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. JSON responses from tools win over key=value.&lt;/strong&gt; I started with &lt;code&gt;"status=SUCCESS | amount=150.00"&lt;/code&gt; and switched to &lt;code&gt;ObjectMapper.writeValueAsString()&lt;/code&gt;. One line of code, zero parsing bugs on the model side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. &lt;code&gt;maxOutputTokens&lt;/code&gt; matters more than you think.&lt;/strong&gt; I set it to 1024 initially. Asking for 5 sagas + fraud scores was consistently truncated. Bumped it to 4096 and the problem disappeared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Virtual threads are not optional.&lt;/strong&gt; When an agent calls 5 MCP tools, those are HTTP calls. Without virtual threads, they’re sequential and slow. One line in application.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;threads&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;virtual&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Parallel MCP calls at zero cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. The AI layer should always be additive.&lt;/strong&gt; Every piece of AI in my system has a fallback. Redis plan not found? Use default saga order. Diagnosis fails? The saga still completes normally. The AI improves operations but never blocks them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Picture
&lt;/h2&gt;

&lt;p&gt;Here’s what the system looks like with all 3 agents running.&lt;/p&gt;

&lt;p&gt;An order comes in. The orchestrator checks Redis for an AI-generated plan and executes the saga. If the saga fails, the OperationsAgent diagnoses the failure using RAG and saves it to the database. Every minute, the SagaComposerAgent reads metrics and failure patterns, then writes new plans to Redis. And anytime, a developer can ask the DataAnalystAgent “why are payments failing for new customers?” and get a grounded answer.&lt;/p&gt;

&lt;p&gt;The agents feed each other. The OperationsAgent’s vectorized events improve the SagaComposerAgent’s RAG context. The DataAnalystAgent’s metrics help the SagaComposerAgent make better plans. It’s a flywheel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The entire project is open source with setup instructions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/pedrop3/saga-orchestration" rel="noopener noreferrer"&gt;github.com/pedrop3/saga-orchestration&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain4j:&lt;/strong&gt; &lt;a href="https://langchain4j.dev" rel="noopener noreferrer"&gt;langchain4j.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’ll need Docker Compose for the infrastructure (Kafka, PostgreSQL, Redis, MongoDB). You’ll also need &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; for local embeddings and a Gemini API key (free tier works fine for testing).&lt;/p&gt;

&lt;p&gt;The README has step-by-step instructions and a pre-configured Bruno collection with all the API requests.&lt;/p&gt;

&lt;p&gt;If you have questions or find issues, open an issue on the repo or drop a comment here. I’m still iterating on the system prompts. They’re never really “done.”&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 3 of a 3-part series on integrating AI into a distributed saga system:&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/pedrop3/part-1-why-i-picked-langchain4j-over-spring-ai-57p"&gt;Part 1 - Why I Picked LangChain4j Over Spring AI&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/pedrop3/part-2-connecting-ai-agents-to-microservices-with-mcp-16m4"&gt;Part 2 - Connecting AI Agents to Microservices with MCP&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3 - Agents That Diagnose, Plan, and Query a Distributed Saga&lt;/strong&gt; ← you are here&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>microservices</category>
    </item>
    <item>
      <title>Part 2 - Connecting AI Agents to Microservices with MCP</title>
      <dc:creator>Pedro Santos</dc:creator>
      <pubDate>Tue, 07 Apr 2026 18:32:17 +0000</pubDate>
      <link>https://dev.to/pedrop3/part-2-connecting-ai-agents-to-microservices-with-mcp-16m4</link>
      <guid>https://dev.to/pedrop3/part-2-connecting-ai-agents-to-microservices-with-mcp-16m4</guid>
      <description>&lt;h1&gt;
  
  
  Connecting AI Agents to Microservices with MCP (No Custom SDKs)
&lt;/h1&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/pedrop3/part-1-why-i-picked-langchain4j-over-spring-ai-57p"&gt;previous post&lt;/a&gt;, I showed how LangChain4j lets you build agents with a Java interface and a couple of annotations. But those agents were using &lt;code&gt;@Tool&lt;/code&gt;, methods defined in the same JVM. Fine for a monolith, but I’m running 5 microservices.&lt;/p&gt;

&lt;p&gt;I needed the AI agent in service A to call business logic in service B, C, D, and E. Without writing bespoke HTTP clients for each one.&lt;/p&gt;

&lt;p&gt;That’s where MCP comes in, and it changed how I think about exposing business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: @Tool Doesn’t Scale Across Services
&lt;/h2&gt;

&lt;p&gt;In my saga orchestration system, I have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;order-service&lt;/strong&gt; (port 3000): MongoDB, manages orders and events&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;product-validation-service&lt;/strong&gt; (port 8090): PostgreSQL, validates catalog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;payment-service&lt;/strong&gt; (port 8091): PostgreSQL, handles payments and fraud scoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;inventory-service&lt;/strong&gt; (port 8092): PostgreSQL, manages stock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;orchestrator&lt;/strong&gt; (port 8050): coordinates the saga via Kafka&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then there’s the &lt;strong&gt;ai-saga-agent&lt;/strong&gt; (port 8099), the service that hosts my AI agents. It needs to query data from ALL other services.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;@Tool&lt;/code&gt;, I’d have to write HTTP clients and DTOs for each service. Error handling, retry logic, the whole nine yards. Every time a service adds a new capability, I’d update the agent’s code. Tight coupling everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP: One Protocol for Everything
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) is basically USB for AI. Instead of writing custom integrations per service, you expose tools via a standard JSON-RPC protocol over HTTP/SSE. Any agent can connect, discover available tools, and call them.&lt;/p&gt;

&lt;p&gt;The before/after in my codebase was dramatic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (without MCP):&lt;/strong&gt; Agent needs stock data, write &lt;code&gt;InventoryHttpClient&lt;/code&gt;. Agent needs payment status, write &lt;code&gt;PaymentHttpClient&lt;/code&gt;. Agent needs order details, write &lt;code&gt;OrderHttpClient&lt;/code&gt;. New tool in inventory? Update the client, update the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After (with MCP):&lt;/strong&gt; Each service exposes an MCP server. Agent connects to &lt;code&gt;http://localhost:8092/sse&lt;/code&gt; and automatically discovers &lt;code&gt;getStockByProduct&lt;/code&gt;, &lt;code&gt;getLowStockAlert&lt;/code&gt;, &lt;code&gt;checkReservationExists&lt;/code&gt;. New tool? Just add it to the MCP server. The agent sees it on next connection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making a Microservice an MCP Server
&lt;/h2&gt;

&lt;p&gt;Let me show you the actual code from my payment-service. It already had a &lt;code&gt;PaymentService&lt;/code&gt; and a &lt;code&gt;FraudValidationService&lt;/code&gt;, real business logic with database queries. I just needed to expose some of those methods as MCP tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Add the Dependency
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gradle"&gt;&lt;code&gt;&lt;span class="n"&gt;implementation&lt;/span&gt; &lt;span class="s1"&gt;'io.modelcontextprotocol.sdk:mcp:0.9.0'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Set Up the Transport
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;HttpServletSseServerTransportProvider&lt;/span&gt; &lt;span class="nf"&gt;mcpTransport&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;HttpServletSseServerTransportProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ObjectMapper&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;messageEndpoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/mcp/message"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ServletRegistrationBean&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;HttpServletSseServerTransportProvider&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;mcpServlet&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;HttpServletSseServerTransportProvider&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ServletRegistrationBean&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/sse"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/mcp/message"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Register Your Tools
&lt;/h3&gt;

&lt;p&gt;Here’s the key part. I’m reusing the same &lt;code&gt;PaymentService&lt;/code&gt; and &lt;code&gt;FraudValidationService&lt;/code&gt; beans that already exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;McpSyncServer&lt;/span&gt; &lt;span class="nf"&gt;mcpServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;HttpServletSseServerTransportProvider&lt;/span&gt; &lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;PaymentService&lt;/span&gt; &lt;span class="n"&gt;paymentService&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;FraudValidationService&lt;/span&gt; &lt;span class="n"&gt;fraudService&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;McpServer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sync&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;serverInfo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"payment-mcp"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"1.0.0"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ServerCapabilities&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;getPaymentStatus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paymentService&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;getRefundRate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paymentService&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;getFraudRiskScore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fraudService&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// same business logic, now via MCP&lt;/span&gt;
        &lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each tool needs four things. A name and description so the LLM understands what it does. A JSON schema for parameters. And a handler function that runs your actual business logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;SyncToolSpecification&lt;/span&gt; &lt;span class="nf"&gt;getPaymentStatus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PaymentService&lt;/span&gt; &lt;span class="n"&gt;paymentService&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"getPaymentStatus"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"Returns the current payment status for a given transaction. "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="s"&gt;"Use to verify whether a payment was processed, pending, or refunded."&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"""
        {
          "type": "object",
          "properties": {
            "transactionId": {
              "type": "string",
              "description": "Transaction ID associated with the saga"
            }
          },
          "required": ["transactionId"]
        }
        """&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;txId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"transactionId"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;paymentService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findByTransactionId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;txId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;map&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"status="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getStatus&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" | totalAmount="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTotalAmount&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" | totalItems="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getTotalItems&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;orElse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"No payment found for transactionId="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;txId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: no new code. The &lt;code&gt;paymentService.findByTransactionId()&lt;/code&gt; method already existed. I’m just wrapping it with a description so the LLM knows when to call it.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Each Service Exposes
&lt;/h3&gt;

&lt;p&gt;I did this for all 4 services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;MCP Tools&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;order-service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;getOrderById&lt;/code&gt;, &lt;code&gt;listRecentEvents&lt;/code&gt;, &lt;code&gt;getLastEventByOrder&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;payment-service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;getPaymentStatus&lt;/code&gt;, &lt;code&gt;getRefundRate&lt;/code&gt;, &lt;code&gt;getFraudRiskScore&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;inventory-service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;getStockByProduct&lt;/code&gt;, &lt;code&gt;getLowStockAlert&lt;/code&gt;, &lt;code&gt;checkReservationExists&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;product-validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;checkProductExists&lt;/code&gt;, &lt;code&gt;checkValidationExists&lt;/code&gt;, &lt;code&gt;listCatalog&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each service keeps full ownership of its data. The MCP layer is just a thin exposure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Side: Connecting as an MCP Client
&lt;/h2&gt;

&lt;p&gt;Now on the ai-saga-agent, I connect to all these servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;McpToolProvider&lt;/span&gt; &lt;span class="nf"&gt;mcpToolProvider&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;McpToolProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mcpClients&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;buildClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:3000/sse"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;     &lt;span class="c1"&gt;// order&lt;/span&gt;
            &lt;span class="n"&gt;buildClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:8091/sse"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;     &lt;span class="c1"&gt;// payment&lt;/span&gt;
            &lt;span class="n"&gt;buildClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:8092/sse"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt;     &lt;span class="c1"&gt;// inventory&lt;/span&gt;
            &lt;span class="n"&gt;buildClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:8090/sse"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;// product-validation&lt;/span&gt;
        &lt;span class="o"&gt;))&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt; &lt;span class="nf"&gt;buildClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sseUrl&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DefaultMcpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;HttpMcpTransport&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sseUrl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sseUrl&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then when I build an agent, I just pass the &lt;code&gt;mcpToolProvider&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiServices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolProvider&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mcpToolProvider&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// discovers tools from all 4 services&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. The agent now has access to 12+ tools across 4 services, without a single HTTP client written by hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Saga Architecture (Quick Context)
&lt;/h2&gt;

&lt;p&gt;For those not familiar with the Saga Pattern: it’s how you handle distributed transactions without two-phase commit. Instead of one big transaction, you have a chain of local transactions. If any step fails, you run compensating transactions to undo the previous steps.&lt;/p&gt;

&lt;p&gt;My flow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Order Service → Orchestrator → Product Validation → Payment → Inventory → Success
                                    ↑                  ↑          ↑
                                    └──── Rollback ←───┴──────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything communicates via Kafka topics. The orchestrator listens for results and decides what to publish next. There’s a state transition table that maps &lt;code&gt;(source, status)&lt;/code&gt; to the next topic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;→ Next Topic&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORCHESTRATOR&lt;/td&gt;
&lt;td&gt;SUCCESS&lt;/td&gt;
&lt;td&gt;product-validation-success&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PRODUCT_VALIDATION&lt;/td&gt;
&lt;td&gt;SUCCESS&lt;/td&gt;
&lt;td&gt;payment-success&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PAYMENT&lt;/td&gt;
&lt;td&gt;SUCCESS&lt;/td&gt;
&lt;td&gt;inventory-success&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INVENTORY&lt;/td&gt;
&lt;td&gt;SUCCESS&lt;/td&gt;
&lt;td&gt;finish-success&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INVENTORY&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;payment-fail (rollback)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PAYMENT&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;td&gt;product-validation-fail (rollback)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The beauty of this setup is that the saga flow is deterministic and auditable. Every event is stored, every transition is logged.&lt;/p&gt;

&lt;h2&gt;
  
  
  @Tool vs MCP Tool: When to Use Each
&lt;/h2&gt;

&lt;p&gt;After building this, my rule of thumb is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;@Tool&lt;/code&gt;&lt;/strong&gt; when the logic lives in the same JVM as the agent. No network overhead, tightly coupled, only that agent can use it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use MCP&lt;/strong&gt; when the logic lives in another service. Any agent can connect. The protocol is language-agnostic (just JSON-RPC), and adding new tools doesn’t require changes on the agent side.&lt;/p&gt;

&lt;p&gt;In practice, my agents use MCP for everything. The only &lt;code&gt;@Tool&lt;/code&gt; I still use is for utility functions that don’t belong in any microservice, like formatting helpers or date calculations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing MCP Endpoints Manually
&lt;/h2&gt;

&lt;p&gt;You can test MCP without an AI agent. It’s just HTTP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Open an SSE session&lt;/span&gt;
curl http://localhost:8092/sse
&lt;span class="c"&gt;# Returns a sessionId&lt;/span&gt;

&lt;span class="c"&gt;# 2. List available tools&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:8092/mcp/message?sessionId=YOUR_SESSION"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'&lt;/span&gt;

&lt;span class="c"&gt;# 3. Call a tool&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:8092/mcp/message?sessionId=YOUR_SESSION"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "jsonrpc":"2.0","id":3,"method":"tools/call",
    "params":{"name":"getStockByProduct","arguments":{"productCode":"COMIC_BOOKS"}}
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is super useful for debugging. When an agent does something unexpected, I test the tool directly to check if it’s the tool or the prompt that’s wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;With MCP in place, the infrastructure was ready. But the interesting part is what the agents actually &lt;em&gt;do&lt;/em&gt; with all these tools. In the next post, I’ll walk through the 3 agents I built. The &lt;strong&gt;OperationsAgent&lt;/strong&gt; listens for failed sagas on Kafka and auto-diagnoses them using RAG. The &lt;strong&gt;SagaComposerAgent&lt;/strong&gt; periodically rewrites the saga execution plan based on real failure data. And the &lt;strong&gt;DataAnalystAgent&lt;/strong&gt; answers natural language questions like “list the 5 most recent failed sagas and assess their fraud risk.”&lt;/p&gt;

&lt;p&gt;The code is all open source: &lt;a href="https://github.com/pedrop3/sagaorchestration" rel="noopener noreferrer"&gt;github.com/pedrop3/sagaorchestration&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of a 3-part series on integrating AI into a distributed saga system:&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/pedrop3/part-1-why-i-picked-langchain4j-over-spring-ai-57p"&gt;Part 1 - Why I Picked LangChain4j Over Spring AI&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 2 - Connecting AI Agents to Microservices with MCP&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Part - Agents That Diagnose, Plan, and Query a Distributed Saga&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>langchain4j</category>
      <category>java</category>
      <category>mcp</category>
      <category>microservices</category>
    </item>
    <item>
      <title>Part 1 - Why I Picked LangChain4j Over Spring AI</title>
      <dc:creator>Pedro Santos</dc:creator>
      <pubDate>Tue, 31 Mar 2026 22:37:29 +0000</pubDate>
      <link>https://dev.to/pedrop3/part-1-why-i-picked-langchain4j-over-spring-ai-57p</link>
      <guid>https://dev.to/pedrop3/part-1-why-i-picked-langchain4j-over-spring-ai-57p</guid>
      <description>&lt;p&gt;Distributed sagas are hard enough without AI. You're already dealing with compensating transactions, Kafka topics, state machines, and rollback chains across 5 microservices. Adding an AI layer on top sounds like a recipe for more complexity.&lt;br&gt;
But that's exactly what this series covers: where AI actually helps in a saga-based architecture, and how to wire it up without making the system more fragile. The AI layer auto-diagnoses failures, dynamically reorders saga steps based on real failure data, and lets developers query the entire system in natural language.&lt;br&gt;
This first post covers the foundation: why I went with LangChain4j as the Java SDK, the core concepts you need, and how to get a working agent running.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why LangChain4j
&lt;/h2&gt;

&lt;p&gt;If you're building AI-powered applications in Java, you're choosing between three options: Python's LangChain (separate stack), Spring AI (native Spring), or LangChain4j (standalone Java library). Here's how they compare on the things that matter for production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangChain4j&lt;/strong&gt; took me by surprise. You define an agent as a Java interface, slap &lt;code&gt;@SystemMessage&lt;/code&gt; on it, and you're done. No implementation class. The framework generates a proxy at runtime. It felt almost too simple, so I kept looking for the catch, but it held up in production.&lt;/p&gt;

&lt;p&gt;Here's the actual comparison I wrote down in my notes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Python LangChain&lt;/th&gt;
&lt;th&gt;Spring AI&lt;/th&gt;
&lt;th&gt;LangChain4j&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ecosystem fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New stack alongside your Java app&lt;/td&gt;
&lt;td&gt;Native Spring&lt;/td&gt;
&lt;td&gt;Zero friction in any Java project&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent definition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Explicit chain construction&lt;/td&gt;
&lt;td&gt;Works but needs extra wiring&lt;/td&gt;
&lt;td&gt;Interface + annotation = done&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API stability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Breaking changes between versions&lt;/td&gt;
&lt;td&gt;0.x→1.x felt like a rewrite&lt;/td&gt;
&lt;td&gt;Stable post-1.0, SemVer respected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full (Python SDK)&lt;/td&gt;
&lt;td&gt;Full since 1.1 GA&lt;/td&gt;
&lt;td&gt;Full, client + server out of the box&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Spring AI 1.1 has caught up on MCP support, which is great. But LangChain4j's agent definition model and API stability won me over.&lt;/p&gt;
&lt;h2&gt;
  
  
  Core Concepts
&lt;/h2&gt;

&lt;p&gt;Before jumping into code, here's the mental model. There are really just 4 things you need to understand:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models&lt;/strong&gt;: the AI engine. You send text, it predicts tokens back. It has no memory between calls. LangChain4j abstracts this behind a &lt;code&gt;ChatModel&lt;/code&gt; interface, so you can swap between Gemini, Ollama, OpenAI, or Claude with one line.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents&lt;/strong&gt;: a loop. The model receives a task, decides which tools to call, calls them, looks at the results, and repeats until it's done. In LangChain4j, you define this as a Java interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt;: Java methods that the model can invoke. You annotate a method with &lt;code&gt;@Tool&lt;/code&gt;, and the model sees its signature and description. It decides &lt;em&gt;when&lt;/em&gt; to call it. You don't write if/else routing logic, the LLM figures it out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG&lt;/strong&gt;: Retrieval Augmented Generation. Before asking the model a question, you search your own database for relevant context and inject it into the prompt. This is how you get answers based on &lt;em&gt;your&lt;/em&gt; data without retraining the model.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting Up Your First Chat Model
&lt;/h2&gt;

&lt;p&gt;Let's start with the basics. You need a &lt;code&gt;ChatModel&lt;/code&gt;, your connection to the LLM.&lt;/p&gt;
&lt;h3&gt;
  
  
  Option 1: Gemini (Cloud)
&lt;/h3&gt;

&lt;p&gt;Add the dependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gradle"&gt;&lt;code&gt;&lt;span class="n"&gt;implementation&lt;/span&gt; &lt;span class="s2"&gt;"dev.langchain4j:langchain4j-google-ai-gemini:1.11.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;GoogleAiGeminiChatModel&lt;/span&gt; &lt;span class="n"&gt;gemini&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GoogleAiGeminiChatModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getenv&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GEMINI_API_KEY"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"gemini-2.5-flash"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxOutputTokens&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Ollama (Local, Free)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; runs LLMs on your machine. Llama, Mistral, Gemma, Qwen — over 100 models, one ollama pull away. No API key, no cloud account, your data stays local.&lt;br&gt;
I rely on it for two things in this project. First, as a chat model during development.I don't want to burn Gemini quota every time I tweak a system prompt. Second, and more importantly, for embeddings. The nomic-embed-text model (~274MB) is what powers the entire RAG pipeline: vectorizing saga events, searching for similar past failures, feeding context into the diagnosis agent. It runs in mill&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama
ollama pull llama3
ollama pull nomic-embed-text   &lt;span class="c"&gt;# for embeddings later&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the dependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gradle"&gt;&lt;code&gt;&lt;span class="n"&gt;implementation&lt;/span&gt; &lt;span class="s2"&gt;"dev.langchain4j:langchain4jollama:1.11.0"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangChain4j integration — chat and embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Chat model — swap for Gemini in prod&lt;/span&gt;
&lt;span class="nc"&gt;OllamaChatModel&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaChatModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;baseUrl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:11434"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"llama3"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Embedding model — used for RAG (vectorizing saga events)&lt;/span&gt;
&lt;span class="nc"&gt;OllamaEmbeddingModel&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaEmbeddingModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;baseUrl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:11434"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"nomic-embed-text"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice: &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; handles all embeddings (even in production, it's that reliable), and I only reach for Gemini when the task needs heavier reasoning. Check the &lt;a href="https://ollama.com/library" rel="noopener noreferrer"&gt;model library&lt;/a&gt; to see what's available.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Beautiful Part: Swap With One Line
&lt;/h3&gt;

&lt;p&gt;Both implement the same &lt;code&gt;ChatModel&lt;/code&gt; interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;ChatModel&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;isProduction&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;gemini&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your agent code doesn't change. I use Ollama locally and Gemini in staging/production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Your First Agent
&lt;/h2&gt;

&lt;p&gt;Here's where LangChain4j shines. Define an interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@SystemMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""
        You are a data analyst for distributed sagas.
        Answer operational questions using the available tools.
        Never invent data.
        """&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@UserMessage&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiServices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;analyze&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"What's the current refund rate?"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No implementation class. LangChain4j creates a proxy that handles the conversation loop, tool calling, and response parsing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Tools
&lt;/h2&gt;

&lt;p&gt;Without tools, the agent can only generate text from its training data. Tools let it access real data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderTools&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Tool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Returns stock for a product. Use to check availability."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;getStock&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@P&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Product code"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;inventoryRepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findByCode&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getAvailable&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Tool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Returns the fraud risk score for an order."&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getFraudScore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fraudService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;calculate&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hour&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiServices&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DataAnalystAgent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OrderTools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when you ask "Is COMIC_BOOKS in stock?", the model sees the &lt;code&gt;getStock&lt;/code&gt; tool, decides to call it with &lt;code&gt;"COMIC_BOOKS"&lt;/code&gt;, gets the result, and formulates a response. You didn't write any routing logic.&lt;/p&gt;

&lt;p&gt;Here's what happens under the hood:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;LangChain4j sends the method signature to Gemini as a &lt;code&gt;functionDeclaration&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Gemini responds with a &lt;code&gt;functionCall&lt;/code&gt;: "I want to call &lt;code&gt;getStock&lt;/code&gt; with &lt;code&gt;code=COMIC_BOOKS&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;LangChain4j intercepts this, runs your Java method, gets the result&lt;/li&gt;
&lt;li&gt;It sends the result back to Gemini as a &lt;code&gt;functionResponse&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Gemini generates the final answer using the real data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A Quick Note on Tokens and Cost
&lt;/h2&gt;

&lt;p&gt;Every word you send and receive costs tokens. On Gemini Flash, input tokens cost about $0.075 per million and output about $0.30 per million. Pretty cheap. But thinking tokens (internal reasoning) can be $3.50 per million.&lt;/p&gt;

&lt;p&gt;Some settings I use to keep costs predictable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;OllamaChatModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;numPredict&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;// caps output tokens&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;numCtx&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32768&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;// context window size&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;       &lt;span class="c1"&gt;// deterministic, no wasted sampling&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;repeatPenalty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.2&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;     &lt;span class="c1"&gt;// avoids loops, shorter responses&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;think&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;            &lt;span class="c1"&gt;// free locally, expensive on cloud&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;listeners&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TokenUsageListener&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;TokenUsageListener&lt;/code&gt; logs input/output tokens per call. I learned the hard way that &lt;code&gt;think(true)&lt;/code&gt; on Ollama is free locally but those thinking tokens count as INPUT on cloud APIs. It can 10x your cost per call.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;In the next post, I'll show how I used all of this to build an AI layer on top of a distributed saga system: 5 microservices coordinated via Kafka, with each service exposing its business logic as MCP tools that any agent can discover and call remotely.&lt;/p&gt;

&lt;p&gt;Everything I covered here (and in the next posts) is already implemented in a complete, working project. The repo includes the 5 microservices, the Kafka-based saga orchestration, the MCP tool layer, and the AI agents, all wired together and ready to run: &lt;a href="//github.com/pedrop3/sagaorchestration"&gt;github.com/pedrop3/sagaorchestration&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 1 of a 3-part series on building AI-powered microservices with LangChain4j:&lt;/em&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Part 1 - Why I Picked LangChain4j Over Spring AI&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/pedrop3/part-2-connecting-ai-agents-to-microservices-with-mcp-16m4"&gt;Part 2 - Connecting AI Agents to Microservices with MCP&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;3 Agents That Diagnose, Plan, and Query a Distributed Saga&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>langchain4j</category>
      <category>microservices</category>
      <category>ai</category>
      <category>java</category>
    </item>
  </channel>
</rss>
