In light of the LLM boom over the past year, we can conclude that virtually everyone has, for one reason or another, used a model from one of the major providers, such as Facebook, Google, Microsoft, Amazon, and others.
From a technical standpoint, it's fascinating to explore how we can develop software that leverages LLMs. In this comprehensive guide, we'll walk through building an AI application using Quarkus and LangChain4j.
Prerequisites
- Basic knowledge of Java, Quarkus and LangChain4j
- JDK 17 or newer
- Maven or Gradle
- An API key from an LLM provider (e.g., OpenAI)
What is LangChain4j?
LangChain4j is a powerful Java framework designed to simplify the integration of Large Language Models (LLMs) and other AI capabilities into Java applications. As a Java port of the popular LangChain project, it provides developers with a robust and type safe way to build AI-powered applications while maintaining the reliability and structure that Java is known for.
The framework is built around composable components that can be mixed and matched to create custom AI solutions:
- Language Models
- Memory Systems
- Document Loaders
- Vector Stores
- Embeddings
- Tools and Agents
Core Components Explained
Language Models
Language Models (LLMs) are the AI engines that process and generate text, with LangChain4j primarily supporting OpenAI's GPT models and other compatible LLMs. They handle tasks like text generation, summarization, and structured data extraction through a unified interface.Memory Systems
Memory Systems maintain conversation history and context between interactions, allowing for coherent multi-turn conversations. The can be configured to store messages in various backends (in-memory, database) and manage conversation context with customizable retention policies.Document Loaders
Document Loaders provide interfaces to read and process various file formats (PDF, TXT, HTML) into a standardized format for AI processing. They handle the complexities of file reading and initial text extraction, making documents ready for further processing.Vector Stores
Vector Stores are specialized databases that store and retrieve text embeddings, enabling semantic search capabilities. They support various backends like Pinecone or local storage, making it efficient to find semantically similar content.Embeddings
Embeddings convert text into numerical vectors that capture semantic meaning, essential for similarity searches and content comparison. They work with various embedding models (like OpenAI's embeddings) to transform text into mathematical representations.Tools and Agents
Tools and Agents are components that extend LLM capabilities by providing specific functionalities (like calculations or API calls) and autonomous decision-making abilities. They allow LLMs to interact with external systems and perform complex tasks through defined interfaces.
Configuration Setup
Adding LangChain4j extensions:
<dependency>
<groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j</artifactId>
<version>${langchain4j.version}</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-core</artifactId>
<version>${langchain4j-core.version}</version>
</dependency>
OpenAI Configuration
quarkus.langchain4j.openai.api-key=${OPENAI_API_KEY}
quarkus.langchain4j.openai.timeout=60s
quarkus.langchain4j.openai.chat-model.temperature=0.7
Simple AI Service
The Simple AI Service is designed for straightforward AI tasks, such as generating text based on a given prompt. In your project, this is implemented in the SimplePostAiService
interface
How It Works
Interface Definition:
- The
@RegisterAiService
annotation marks the interface as an AI service that LangChain4j will implement. - The
writeNews
method takes a topic and length as input and generates a news article.
Example Code:
@RegisterAiService
public interface SimplePostAiService {
String writeNews(String topic, int length);
}
Use Case:
This service can be used to generate short news articles or summaries based on a specific topic. For example, calling writeNews("technology", 100)
might generate a 100-word article about the latest tech trends.
Integration:
The service is injected into the GreetingResource
class and exposed via a REST endpoint:
@GET
@Path("/news/{topic}/{length}")
public String writeNews(@PathParam("topic") String topic, @PathParam("length") int length) {
return simplePostAiService.writeNews(topic, length);
}
Advanced AI Service with Memory
Memory is a fundamental component in LangChain4j that plays a crucial role in creating meaningful and contextual interactions between users and AI models. Unlike traditional request-response systems where each interaction stands alone, memory enables applications to maintain context and create more natural, flowing conversations.
The primary purpose of memory in LangChain4j is to maintain conversation history and context. Without memory, each interaction with an AI model would be isolated, forcing users to repeatedly provide context or background information. For example, in a conversation about travel planning, memory allows the AI to remember previously mentioned destinations, dates, or preferences without requiring the user to restate this information in each message.
The Advanced AI Service with Memory enhances AI interactions by maintaining context across multiple requests. This is implemented in the AdvancedPostAiService
interface.
Interface Definition:
- The
@RegisterAiService
annotation marks the interface as an AI service. - The
chatWithMailPostBotWithMemory
method uses the@MemoryId
annotation to associate a memory with a specific user.
Example Code:
@RegisterAiService
public interface AdvancedPostAiService {
String chatWithMailPostBotWithMemory(@MemoryId String userId, String text);
}
Use Case:
This service is ideal for applications like chatbots, where maintaining context across conversations is crucial. For example, a user could ask, "What’s the weather today?" and follow up with "What about tomorrow?" without needing to repeat the location.
Integration:
The service is injected into the GreetingResource
class and exposed via a REST endpoint
@GET
@Path("/mail-with-memory")
public String writeNewsWithMemory(
@QueryParam("text") String text, @QueryParam("userId") String userId) {
return advancedPostAiService.chatWithMailPostBotWithMemory(userId, text);
}
Understanding Message Types in LangChain4j
System Messages (
@SystemMessage
)
System messages serve as instructions or context-setting
information for the AI model. They help define the behavior, role,
or constraints for the AI's responses. These messages are not
typically visible to end users but are crucial for controlling how
the AI behaves.User Messages (
@UserMessage
)
User messages represent the actual input from users interacting with the AI system. These messages contain the queries, statements, or requests that the AI needs to respond to. They form the core of the interaction between users and the AI.
Understanding Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is an advanced approach that combines the power of large language models (LLMs) with a knowledge base of specific information. Instead of relying solely on an LLM's pre-trained knowledge, RAG retrieves relevant information from a custom document collection before generating responses. This approach significantly improves the accuracy and reliability of AI generated responses by grounding them in specific, verified information.
Why RAG is Needed?
Traditional LLMs face several limitations that RAG helps to overcome:
- LLMs have knowledge limited to their training data cutoff date. RAG allows them to access current information.
- LLMs might generate plausible but incorrect information. RAG reduces this by providing specific, accurate source material.
- Companies often need responses based on their specific documentation, policies, or knowledge base. RAG enables this customization.
- With RAG, responses can be traced back to source documents, making the system more transparent and trustworthy.
Key Components of RAG: A Step-by-Step Process
Document Processing
The first crucial step in implementing RAG involves preparing and processing your documents. This starts with loading documents from various sources such as files, databases, or web content. Documents are then split into smaller, manageable chunks that are optimal for processing and retrieval. The splitting process must balance chunk size with context preservation, ensuring that each segment maintains meaningful information while staying within the model's token limits. This stage also includes cleaning and standardizing the text, removing irrelevant information, and structuring the content appropriately.Embedding Generation
Once documents are processed and split, the next step is converting these text chunks into embeddings. Embeddings are numerical representations of text that capture semantic meaning, allowing for efficient similarity searches. This process uses embedding models (such as those from OpenAI or other providers) to transform text into high-dimensional vectors. These embeddings serve as the foundation for semantic search capabilities, enabling the system to find relevant information based on meaning rather than just keywords.Vector Storage
The third key component involves storing and organizing these embeddings in a vector database. Vector stores are specialized databases designed to efficiently store and retrieve high-dimensional vectors. They index the embeddings in a way that enables fast similarity searches, which is crucial for retrieving relevant context during query processing. The choice of vector store depends on factors like data volume, performance requirements, and scaling needs. Popular options include Pinecone, Weaviate, or local vector stores for smaller applications.Retrieval Mechanism
When a query is received, the system converts it into an embedding and searches the vector store for similar vectors. This retrieval process identifies the most relevant document chunks based on semantic similarity. The mechanism can be enhanced with filters, metadata search, and hybrid approaches combining semantic and keyword search for better accuracy.Context Assembly
After retrieving relevant chunks, the system assembles them into a coherent context. This involves selecting the most pertinent information, ordering it logically, and ensuring it fits within the model's context window. The assembly process might include deduplication, relevance ranking, and context summarization to optimize the input for the language model.Response Generation
The final step involves combining the assembled context with the original query to create a prompt for the language model. This prompt engineering ensures the model understands how to use the provided context to generate accurate, relevant responses. The system then processes this through the LLM to generate the final response, which should be grounded in the retrieved information while maintaining natural language fluency.
Retrieval-Augmented Generation (RAG)
The Document Retrieval Service leverages LangChain4j’s embedding models and document storage to retrieve relevant information.
This is implemented in the DocsAiService
interface.
Interface Definition:
- The
@RegisterAiService
annotation marks the interface as an AI service. - The
chatWithQuarkusDocsBot
method takes a user prompt and retrieves relevant information from a document store.
Example Code:
@RegisterAiService
public interface DocsAiService {
String chatWithQuarkusDocsBot(@UserMessage String prompt);
}
Embedding Ingesters:
Embedding ingesters are responsible for processing and storing documents in a way that makes them searchable. They split documents into chunks, generate embeddings, and store them in a vector database.
Document Splitting:
DocumentSplitter splitter = DocumentSplitters.recursive(500, 0);
Embedding Generation:
List<Embedding> embeddings = embeddingModel.embed(chunks);
Storage in Vector Store:
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.documentSplitter(splitter)
.build();
ingestor.ingest(documents);
The DocumentsRetriever
class is a Quarkus-managed bean that provides a RetrievalAugmentor
for integrating retrieval-augmented generation (RAG) into your application. It is annotated with @ApplicationScoped
, meaning it is instantiated once and shared across the application. The class constructs an EmbeddingStoreContentRetriever
using a provided RedisEmbeddingStore
and EmbeddingModel
. This retriever fetches relevant documents based on semantic similarity. It configures a DefaultRetrievalAugmentor
with the retriever, which can be used to augment LLM responses with contextually relevant information. The maxResults
parameter is set to 20, limiting the number
of documents retrieved per query. The get()
method returns the configured RetrievalAugmentor
, which can be injected into other components to enhance LLM interactions.
@ApplicationScoped
public class DocumentsRetriever implements Supplier<RetrievalAugmentor> {
private final RetrievalAugmentor augmentor;
public DocumentsRetriever(final RedisEmbeddingStore store, final EmbeddingModel model) {
EmbeddingStoreContentRetriever contentRetriever =
EmbeddingStoreContentRetriever.builder()
.embeddingModel(model)
.embeddingStore(store)
.maxResults(20)
.build();
augmentor = DefaultRetrievalAugmentor.builder().contentRetriever(contentRetriever).build();
}
@Override
public RetrievalAugmentor get() {
return augmentor;
}
}
You can explore the complete source code for this project in the GitHub repository.
Conclusion
Building AI applications with Quarkus and LangChain4j provides a powerful combination for creating intelligent, production-ready solutions. Through this guide, we've explored the essential components and implementation patterns, from basic AI services to advanced features like RAG and embedding ingesters.
Top comments (0)