Diego Pérez

Posted on Sep 17

LangChain4j in Action: Building an AI Assistant in Java

#ai #openai #rag #langchain4j

What is this blog post about?

This post explores one of the most interesting tools available in the Java ecosystem today for building AI-powered applications: LangChain4j, a library designed for creating AI-driven solutions. I’ll walk you through how to build an AI Assistant using core LLM application patterns like prompt templating, retrieval-augmented generation (RAG), moderation, and input/output guardrails.

The example presented here is built using Quarkus as the Java framework, but the core ideas and code can be easily applied to other frameworks like Spring, Helidon, Micronaut, or even plain Java. These concepts are not tied to Java either; they can be just as easily adapted to other languages and platforms. If you're looking to explore how AI can enhance your applications, you're in the right place.

Open Source AI Frameworks

This post focuses on how to build an AI application using an open source library (LangChain4j), something you can run in your own cloud setup or even on-premises if that’s what you need. Open source tools usually take a bit more effort to set up and maintain, but they give you full control. Whether that makes sense usually depends on where your data lives, your compliance requirements, and how much control you actually want.

On the other hand, platforms like Azure AI Foundry, AWS Bedrock, or Vertex AI offer more complete and managed solutions. They take care of most of the heavy lifting like scaling, integrations, and evaluation, and they also include a solid security and governance layer. These platforms are very mature and production-ready. Microsoft, for example, already provides a responsible AI framework out of the box. These platforms are a good choice when you want to move fast without spending too much time on infrastructure or setting up compliance from scratch. They also come with a lot of built-in tools ready to use, so you can focus more on the problem you are actually trying to solve.

In this case, I wanted to show what it looks like to go with the open source route, maintain full control, and gain a better understanding of how everything works under the hood.

The State of AI in the Java Ecosystem

Python has long been the go-to language for building AI applications, and with good reason. Its ecosystem is packed with mature libraries, active community support, and tooling that makes things move fast. That’s not likely to change any time soon. That said, Java has been around for over 30 years and continues to be one of the most widely used languages in the industry. It's constantly evolving and has shaped a big part of how modern software is built.

Today’s Java is lightweight, fast, and flexible. It is far from the old-school Java 7 or even Java 8, which makes it a solid option for virtually any type of application, including those involving AI.

In this post, I wanted to give a quick glance at how Java fits into the current AI landscape, and where tools like LangChain4j come in as an interesting and promising option for building AI-powered applications, especially if your team is already working with Java.

If you want to get a better sense of what modern Java looks like, I’d personally recommend checking out JavaPro magazine, especially this edition:
30 Years of Java – Part 2

Main AI Libraries for Java Development

LangChain4j

LangChain4j is a Java framework introduced in 2023, inspired by the popular LangChain library from the Python world. It’s fully open source under the Apache 2.0 License and actively maintained by a passionate community.

What makes LangChain4j interesting is its practical, modular toolbox. It gives you a bunch of ready-to-use components like:

prompt templating
chat memory management
agents
image models
function calling
RAG (Retrieval-Augmented Generation)
MCP integration

It supports multiple LLM providers and vector stores out of the box. And if you need more control, it lets you dive into lower-level details too. There's even an API for building not just basic assistants, but fully autonomous, decision-making agents if that’s what your use case needs.

You can use it in any plain Java project, and it also has built-in support for popular frameworks like Spring, Quarkus, Helidon, and Micronaut.

Spring IA

Spring AI is basically the official way to bring AI into Spring-based projects. It follows familiar Spring principles like portability and modularity, so if you're already using Spring, it fits in pretty well.

Some of the key feature it includes:

Support for chat completion and embeddings
Tools for text-to-image, text-to-speech, and audio translation
Built-in support for RAG if you're working with retrieval-based flows
Integration with MCP (If you would like to learn more about this, you can refer to the following post)

Comparing Tools

Both tools are solid options. It really depends on what you're looking for. In my experience, LangChain4j feels a bit more flexible when it comes to tweaking low-level details, and it supports a broader range of integrations, including 15+ LLM providers and vector stores. It also offers more advanced building blocks like Agents, AiServices, and Chains, which make it easier to orchestrate complex workflows when needed.

That said, both LangChain4j and Spring AI are under active development, and I’m sure they’ll keep evolving quickly and adding new features.

For this blog, I decided to build the example using LangChain4j.

Retrieval-Augmented Generation (RAG)

RAG is basically a pattern that uses your own data to ground prompts, so LLMs can use that information as context and generate more accurate and relevant responses.

In simple terms, it works like this: you have data, that data gets ingested and split into chunks, and each chunk is enriched with metadata. Then, an embedding model transforms each chunk into a vector, which is just a list of numbers that represents the meaning of the text. These vectors are stored in a vector database.

This setup allows the system to perform semantic search over your data. Instead of just matching keywords, it understands the meaning behind each chunk and retrieves the most relevant ones based on that. These chunks are then passed to the model as context, so the response is generated using your actual content, not just what the model was originally trained on.

This post focuses on how to implement RAG. If you want to learn more about the concept itself, I recommend this overview:
🔗 https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

Building and IA Assistance from Scratch

First thing is to identify a potential use case. For this example, the idea is to build an Onboarding AI Assistant. The goal is to have a chat assistant powered by an LLM, so we can interact with it using natural language. This assistant will ground its responses using RAG, and we’ll also implement some protection patterns to guard the input of the model.

The main idea behind this PoC assistant is not to replace the onboarding process, but to support and complement it by enabling things like:

Asking about necessary permissions (and even triggering requests using tooling)
Understanding the business domain of the company and answering questions about how it works.
Answering technical questions related to the team or platform
Helping with key business processes and how they're done
Providing information about API contracts and similar resources
Explaining team coding standards or practices
Explaining team-specific flows, for example how releases are managed

In this case, the RAG setup is designed to consume documents with key information about a fictitious company called "AcmeCorp." The business objective is to provide quick and accurate information to help new team members get up to speed and start contributing in less time.

The final solution looks like this:

LLM/Embedding Model:
According to the diagram above, OpenAI models are used for both the chat assistant and the embedding model. However, many other models are also supported, including open-source ones that can be deployed on-premises if needed. For more details about supported language models, please refer to:
🔗 https://docs.langchain4j.dev/category/language-models

Embedding Store:
Redis is used as the vector store in this setup. Note that LangChain4j supports multiple embedding stores. For more details, see:
🔗 https://docs.langchain4j.dev/category/embedding-models

For additional integrations such as image models, document loaders (including S3, Azure Blob Storage, and Google Cloud Storage), document parsers (for text, PDF, and more), and other advanced capabilities, please refer to:
🔗 https://docs.langchain4j.dev/category/integrations

Technology Selection

- Quarkus: This framework was selected for its strong suitability for cloud-native development. It offers a great developer experience thanks to powerful tools like hot reload, making the development cycle fast and productive.

This framework also benefits from a very active community and thousands of extensions that cover a wide range of use cases. Additionally, its small memory footprint and extremely fast startup times make it an excellent choice for modern Java applications, whether deployed in the cloud or running on-premises.

- AI Library:

LangChain4j was selected, using the specific Quarkus extension built on top of it to enable a smoother and more seamless integration.

- Chat UI:

In this case, for simplicity, the example is built using Vaadin, an open-source Java framework for building web apps. It lets you create UIs fully in Java and also supports using TypeScript or JavaScript components when needed.

You can even build UIs using a drag-and-drop editor powered by AI copilots, which makes it an interesting option if you're working in the Java ecosystem.

It is important to note that there are many tools available for building a chat UI. You can also choose to build one from scratch using your preferred frontend framework. In this case, the priority was to move fast, so a simpler solution was chosen.

Creating the Project
The first step is to create the project using the official Quarkus page to bootstrap a new app: https://code.quarkus.io/

For this example, the configuration is as follows:

Adding LangChain4j Model Dependencies

In this case, OpenAI will be used via the Quarkus extension of LangChain4j OpenAI, which also transitively includes the core LangChain4j dependencies required by the project.

To begin, add the following global property to define the dependency version that will be used across the project:

<properties>
   …
   <quarkus-langchain4j.version>1.1.3</quarkus-langchain4j.version> 
</properties>

Next, add the dependency itself:

<dependencies>
  ...
  <dependency>
      <groupId>io.quarkiverse.langchain4j</groupId>
      <artifactId>quarkus-langchain4j-openai</artifactId>
      <version>${quarkus-langchain4j.version}</version>
  </dependency>
</dependencies>

This dependency will vary depending on whether you want to work with models from Anthropic, Gemini, LLaMA 3, or others. In that case, you’ll need to replace it with the appropriate one. The good news is that LangChain4j provides integrations for both proprietary providers and open-source models, which can also be deployed on-premises using tools like Ollama or Hugging Face.

Build RAG Ingestion

In this case, Redis will be used as the vector database and accessed through an embedding store provided by LangChain4j.

To enable this setup, add the following dependency:

<dependencies>
   ...
   <dependency>
      <groupId>io.quarkiverse.langchain4j</groupId>
      <artifactId>quarkus-langchain4j-redis</artifactId>
      <version>${quarkus-langchain4j.version}</version>
   </dependency>

Then, configure the embedding model in your application.properties file:

quarkus.langchain4j.openai.api-key=${OPENAI_API_KEY:}
quarkus.langchain4j.openai.embedding-model.model-name=text-embedding-ada-002

This sets the OpenAI API key (using the OPENAI_API_KEY environment variable) and selects text-embedding-ada-002 as the embedding model.

The next step is to add the following config to setup redis as embedding store:

quarkus.redis.hosts=${REDIS_HOSTS:}
quarkus.redis.password=${REDIS_PASSWORD:}

quarkus.langchain4j.redis.index-name=embedding-index
quarkus.langchain4j.redis.dimension=1536
quarkus.langchain4j.redis.distance-metric=cosine

It's important to mention that the dimension must match the one used by the selected embedding model, and the algorithm being used is cosine similarity.

This type of similarity measures how close two vectors point in the same direction, which works particularly well when comparing embeddings because it focuses on meaning rather than exact values. It's commonly used in semantic search.

The next step is to add a couple of custom parameters to personalize the ingestion and retrieval process:

# App-specific RAG config
app.rag.ingestion.chunk-size=700
app.rag.ingestion.chunk-overlap=140
app.rag.ingestion.files-location=./docs
app.rag.retrieval.min-score=0.85
app.rag.retrieval.max-results=5

These parameters control how documents are processed and how results are retrieved:

chunk-size: the number of tokens per chunk when splitting documents.
chunk-overlap: how many tokens overlap between chunks to preserve context.
files-location: the local folder where the documents to ingest are located.
min-score: the minimum similarity score required for a result to be considered relevant.(Recommended: 0.75–0.90)
max-results: the maximum number of chunks to retrieve per query. (Recommended: 3–10)

Adding our config mapping:

package com.github.dvindas.onboardingassistant.config;

import io.smallrye.config.ConfigMapping;

import java.nio.file.Path;

@ConfigMapping(prefix = "app.rag")
public interface RAGConfig {
    IngestionConfig ingestion();

    RetrievalConfig retrieval();

    interface IngestionConfig {
        int chunkSize();

        int chunkOverlap();

        Path filesLocation();
    }

    interface RetrievalConfig {
        double minScore();

        int maxResults();
    }

}

The next step is to add the logic that handles document ingestion. The process can be broken down into a few simple steps:

Load: Read the documents you want to ingest from a folder.
Configure: Set the embedding model and the embedding store where vectors will be saved.
Split: Define the strategy to split documents into chunks, including size and overlap.
Ingest: Generate embeddings for each chunk and store them along with metadata.

First, let's define a simple data structure to return a summary of the ingestion process.

package com.github.dvindas.onboardingassistant.model;

public record IngestionSummary(int totalTokenCount, int totalDocumentsProcessed) {
}

Ingestion process:

package com.github.dvindas.onboardingassistant.ai.rag;

import com.github.dvindas.onboardingassistant.config.RAGConfig;
import com.github.dvindas.onboardingassistant.model.IngestionSummary;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;
import dev.langchain4j.data.document.parser.apache.pdfbox.ApachePdfBoxDocumentParser;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModelName;
import dev.langchain4j.model.openai.OpenAiTokenCountEstimator;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import io.quarkiverse.langchain4j.redis.RedisEmbeddingStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;
import org.jboss.logging.Logger;

import java.nio.file.FileSystems;
import java.util.List;
import java.util.Objects;

import static dev.langchain4j.data.document.splitter.DocumentSplitters.recursive;

@ApplicationScoped
public class DocumentIngestor {

    private static final Logger LOG = Logger.getLogger(DocumentIngestor.class);

    @Inject
    RAGConfig ragConfig;
    @Inject
    RedisEmbeddingStore store;
    @Inject
    EmbeddingModel embeddingModel;

    public IngestionSummary ingest() {
        LOG.infof("Loading pdf files");
        var documents = loadPDFDocuments();
        LOG.infof("Loaded %d pdf files", documents.size());
        var ingestionSummary = processDocumentsIntoEmbeddings(documents);
        LOG.infof("Ingestion summary: %s", ingestionSummary.toString());
        return ingestionSummary;
    }

    private List<Document> loadPDFDocuments() {
        var pathMatcher = FileSystems.getDefault().getPathMatcher("glob:*pdf");
        var parser = new ApachePdfBoxDocumentParser();
        return FileSystemDocumentLoader.loadDocumentsRecursively(ragConfig.ingestion().filesLocation(), pathMatcher, parser);
    }

    private IngestionSummary processDocumentsIntoEmbeddings(List<Document> documents) {
        Objects.requireNonNull(documents, "documents must not be null");

        if (!documents.isEmpty()) {
            var tokenCountEstimator = new OpenAiTokenCountEstimator(OpenAiEmbeddingModelName.TEXT_EMBEDDING_ADA_002);

            var ingestor = EmbeddingStoreIngestor.builder()
                    .embeddingStore(store)
                    .embeddingModel(embeddingModel)
                    .documentSplitter(recursive(ragConfig.ingestion().chunkSize(), ragConfig.ingestion().chunkOverlap(), tokenCountEstimator))
                    .build();

            var result = ingestor.ingest(documents);

            return new IngestionSummary(result.tokenUsage().totalTokenCount(), documents.size());
        }

        return new IngestionSummary(0, 0);
    }

}

Key Lanchain4j Abstractions:
There are a few key components that make up the ingestion flow in LangChain4j.

RedisEmbeddingStore: handles storing and retrieving embeddings in Redis.
EmbeddingModel: is in charge of turning text into vector representations, depending on the model you choose (for example, OpenAI).
ApachePdfBoxDocumentParser: extracts text from PDF files so you can work with the raw content.
OpenAiTokenCountEstimator: helps estimate how many tokens a piece of text has, which is useful when splitting documents or controlling prompt size.
EmbeddingStoreIngestor: brings it all together. It takes the document, splits it into chunks, generates embeddings, and stores everything in the embedding store along with metadata.

So we can see that LangChain4j makes RAG ingestion pretty simple. It hides most of the low-level details, while still giving you the flexibility to customize the pipeline when needed. You can define your own parsers or document splitters to control how the data is chunked, enriched with metadata, or even transformed. For example, you could perform anonymization before generating embeddings. This gives you full control over how the data is processed and indexed into the embedding store. For more details about this please refer to:
🔗 https://docs.langchain4j.dev/tutorials/rag