Distributed sagas are hard enough without AI. You're already dealing with compensating transactions, Kafka topics, state machines, and rollback chains across 5 microservices. Adding an AI layer on top sounds like a recipe for more complexity.
But that's exactly what this series covers: where AI actually helps in a saga-based architecture, and how to wire it up without making the system more fragile. The AI layer auto-diagnoses failures, dynamically reorders saga steps based on real failure data, and lets developers query the entire system in natural language.
This first post covers the foundation: why I went with LangChain4j as the Java SDK, the core concepts you need, and how to get a working agent running.
Why LangChain4j
If you're building AI-powered applications in Java, you're choosing between three options: Python's LangChain (separate stack), Spring AI (native Spring), or LangChain4j (standalone Java library). Here's how they compare on the things that matter for production:
LangChain4j took me by surprise. You define an agent as a Java interface, slap @SystemMessage on it, and you're done. No implementation class. The framework generates a proxy at runtime. It felt almost too simple, so I kept looking for the catch, but it held up in production.
Here's the actual comparison I wrote down in my notes:
| Python LangChain | Spring AI | LangChain4j | |
|---|---|---|---|
| Ecosystem fit | New stack alongside your Java app | Native Spring | Zero friction in any Java project |
| Agent definition | Explicit chain construction | Works but needs extra wiring | Interface + annotation = done |
| API stability | Breaking changes between versions | 0.x→1.x felt like a rewrite | Stable post-1.0, SemVer respected |
| MCP support | Full (Python SDK) | Full since 1.1 GA | Full, client + server out of the box |
Spring AI 1.1 has caught up on MCP support, which is great. But LangChain4j's agent definition model and API stability won me over.
Core Concepts
Before jumping into code, here's the mental model. There are really just 4 things you need to understand:
Models: the AI engine. You send text, it predicts tokens back. It has no memory between calls. LangChain4j abstracts this behind a ChatModel interface, so you can swap between Gemini, Ollama, OpenAI, or Claude with one line.
Agents: a loop. The model receives a task, decides which tools to call, calls them, looks at the results, and repeats until it's done. In LangChain4j, you define this as a Java interface.
Tools: Java methods that the model can invoke. You annotate a method with @Tool, and the model sees its signature and description. It decides when to call it. You don't write if/else routing logic, the LLM figures it out.
RAG: Retrieval Augmented Generation. Before asking the model a question, you search your own database for relevant context and inject it into the prompt. This is how you get answers based on your data without retraining the model.
Setting Up Your First Chat Model
Let's start with the basics. You need a ChatModel, your connection to the LLM.
Option 1: Gemini (Cloud)
Add the dependency:
implementation "dev.langchain4j:langchain4j-google-ai-gemini:1.11.0"
Build the model:
GoogleAiGeminiChatModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.temperature(0.0)
.maxOutputTokens(1024)
.build();
Option 2: Ollama (Local, Free)
Ollama runs LLMs on your machine. Llama, Mistral, Gemma, Qwen — over 100 models, one ollama pull away. No API key, no cloud account, your data stays local.
I rely on it for two things in this project. First, as a chat model during development.I don't want to burn Gemini quota every time I tweak a system prompt. Second, and more importantly, for embeddings. The nomic-embed-text model (~274MB) is what powers the entire RAG pipeline: vectorizing saga events, searching for similar past failures, feeding context into the diagnosis agent. It runs in mill
brew install ollama
ollama pull llama3
ollama pull nomic-embed-text # for embeddings later
ollama serve
Add the dependency:
implementation "dev.langchain4j:langchain4jollama:1.11.0"
LangChain4j integration — chat and embeddings:
// Chat model — swap for Gemini in prod
OllamaChatModel ollama = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3")
.temperature(0.0)
.build();
// Embedding model — used for RAG (vectorizing saga events)
OllamaEmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
.baseUrl("http://localhost:11434")
.modelName("nomic-embed-text")
.build();
In practice: Ollama handles all embeddings (even in production, it's that reliable), and I only reach for Gemini when the task needs heavier reasoning. Check the model library to see what's available.
The Beautiful Part: Swap With One Line
Both implement the same ChatModel interface:
ChatModel model = isProduction ? gemini : ollama;
Your agent code doesn't change. I use Ollama locally and Gemini in staging/production.
Your First Agent
Here's where LangChain4j shines. Define an interface:
public interface DataAnalystAgent {
@SystemMessage("""
You are a data analyst for distributed sagas.
Answer operational questions using the available tools.
Never invent data.
""")
String analyze(@UserMessage String question);
}
Build it:
DataAnalystAgent agent = AiServices.builder(DataAnalystAgent.class)
.chatModel(gemini)
.build();
String answer = agent.analyze("What's the current refund rate?");
That's it. No implementation class. LangChain4j creates a proxy that handles the conversation loop, tool calling, and response parsing.
Adding Tools
Without tools, the agent can only generate text from its training data. Tools let it access real data.
public class OrderTools {
@Tool("Returns stock for a product. Use to check availability.")
public int getStock(@P("Product code") String code) {
return inventoryRepo.findByCode(code).getAvailable();
}
@Tool("Returns the fraud risk score for an order.")
public String getFraudScore(double amount, String type, int hour) {
return fraudService.calculate(amount, type, hour);
}
}
Register them:
DataAnalystAgent agent = AiServices.builder(DataAnalystAgent.class)
.chatModel(gemini)
.tools(new OrderTools())
.build();
Now when you ask "Is COMIC_BOOKS in stock?", the model sees the getStock tool, decides to call it with "COMIC_BOOKS", gets the result, and formulates a response. You didn't write any routing logic.
Here's what happens under the hood:
- LangChain4j sends the method signature to Gemini as a
functionDeclaration - Gemini responds with a
functionCall: "I want to callgetStockwithcode=COMIC_BOOKS" - LangChain4j intercepts this, runs your Java method, gets the result
- It sends the result back to Gemini as a
functionResponse - Gemini generates the final answer using the real data
A Quick Note on Tokens and Cost
Every word you send and receive costs tokens. On Gemini Flash, input tokens cost about $0.075 per million and output about $0.30 per million. Pretty cheap. But thinking tokens (internal reasoning) can be $3.50 per million.
Some settings I use to keep costs predictable:
OllamaChatModel.builder()
.numPredict(512) // caps output tokens
.numCtx(32768) // context window size
.temperature(0.0) // deterministic, no wasted sampling
.repeatPenalty(1.2) // avoids loops, shorter responses
.think(true) // free locally, expensive on cloud
.listeners(List.of(new TokenUsageListener()))
.build();
That TokenUsageListener logs input/output tokens per call. I learned the hard way that think(true) on Ollama is free locally but those thinking tokens count as INPUT on cloud APIs. It can 10x your cost per call.
What's Next
In the next post, I'll show how I used all of this to build an AI layer on top of a distributed saga system: 5 microservices coordinated via Kafka, with each service exposing its business logic as MCP tools that any agent can discover and call remotely.
Everything I covered here (and in the next posts) is already implemented in a complete, working project. The repo includes the 5 microservices, the Kafka-based saga orchestration, the MCP tool layer, and the AI agents, all wired together and ready to run: github.com/pedrop3/sagaorchestration
This is part 1 of a 3-part series on building AI-powered microservices with LangChain4j:
- Why I Picked LangChain4j Over Spring AI
- Connecting AI Agents to Microservices with MCP
- 3 Agents That Diagnose, Plan, and Query a Distributed Saga
Top comments (0)