Build a Context-Aware Application
RAG is in place now from at 2–3 years, still there is not much clarity on this aspect in Java ecosystem, though efforts have been made via LangChain4J but it is not that clear like mature frameworks like as spring.
Retrieval-Augmented Generation (RAG) is a pattern that enhances Large Language Models (LLMs) by providing them with external, up-to-date, or proprietary data, which reduces hallucinations and grounds the response in facts. Spring AI provides an idiomatic and seamless way to implement RAG within the Spring Boot ecosystem.
Introduction: Spring AI vs. LangChain
Spring AI is a framework that aims to apply Spring ecosystem design principles — such as portability (across models and vector stores) and modular design — to the AI domain. It is a natural choice for Java/Spring Boot developers as it fully embraces Spring conventions like Dependency Injection, auto-configuration, and POJOs (Plain Old Java Objects).
Spring AI can be a strong alternative to LangChain (and its Java port, LangChain4j) for RAG, especially within an enterprise setting, because:
- Seamless Spring Boot Integration: It uses Spring Boot starters, making setup incredibly fast. You get an automatically configured ChatClient and VectorStore by simply adding dependencies and properties.
- Idiomatic Java: The APIs feel like other Spring APIs (like WebClient or JdbcTemplate), leveraging familiar patterns for Java developers.
- Enterprise-Grade Features: It is backed by the Spring ecosystem, inheriting robust features like observability, security, and consistent configuration.
- Focus on Abstraction: It provides high-level abstractions like the Advisor API for RAG, which encapsulates the entire retrieval and prompt augmentation logic, often requiring less boilerplate than manually stitching together a chain.
Prerequisites
To follow this tutorial, you will need:
- Java 21 or later.
- Maven or Gradle.
- An API Key for an LLM provider (e.g., OpenAI, Google Gemini, etc.). We will use OpenAI for this example.
- The latest Spring AI Bill of Materials (BOM). We will assume the latest stable version of Spring AI is used.
Step 1: Project Setup (Using Maven)
Create a new Spring Boot project (e.g., using start.spring.io) and add the following dependencies. We will use the OpenAI model and the PostgreSQL/PGVector vector store for a robust, production-ready setup.
In your POM.xml add
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>1.0.0</version> <type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
</dependencies>
Step 2: Configuration
Configure the LLM API key and the PostgreSQL vector store in your application.properties (or application.yml).
Note: For the PGVector store, you’ll need a running PostgreSQL database with the pgvector extension enabled. Using Docker Compose is recommended for local development.
Properties
# LLM Configuration (OpenAI Example)
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.model=gpt-4o-mini
spring.ai.openai.embedding.model=text-embedding-3-small
# PostgreSQL/PGVector Configuration
spring.datasource.url=jdbc:postgresql://localhost:5432/ragdb
spring.datasource.username=user
spring.datasource.password=password
spring.jpa.hibernate.ddl-auto=update
# Spring AI Vector Store Schema Initialization
# This creates the necessary table for the vector store
spring.ai.vectorstore.pgvector.initialize-schema=true
Step 3: Document Ingestion Service (ETL)
The first part of RAG is the Extract, Transform, Load (ETL) pipeline. We read a document, split it into smaller chunks (documents), generate embeddings for the chunks, and store them in the VectorStore.
Create a service named IngestionService.java:
package com.example.ragtutorial;
import org.springframework.ai.document.Document;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@Service
public class IngestionService implements CommandLineRunner {
private static final Logger log = LoggerFactory.getLogger(IngestionService.class);
private final VectorStore vectorStore;
// Use a text file for simplicity. Place it in src/main/resources/data/
@Value("classpath:/data/spring-ai-info.txt")
private Resource dataResource;
public IngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
@Override
public void run(String... args) {
log.info("Starting RAG document ingestion...");
// 1. Extract: Read the document content
TextReader textReader = new TextReader(dataResource);
List<Document> rawDocuments = textReader.get();
// 2. Transform: Split the large document into smaller, manageable chunks
// TokenTextSplitter ensures chunks fit within the LLM's context window
TokenTextSplitter textSplitter = new TokenTextSplitter();
List<Document> splitDocuments = textSplitter.apply(rawDocuments);
// 3. Load: Store the documents (which creates and stores embeddings)
vectorStore.accept(splitDocuments);
log.info("Document ingestion complete. {} chunks loaded into VectorStore.", splitDocuments.size());
}
}
Example content for src/main/resources/data/spring-ai-info.txt:
Spring AI is an application framework for AI engineering. Its goal is to apply Spring ecosystem design principles to the AI domain. It connects enterprise data and APIs with AI Models. It offers a portable API across different AI providers like OpenAI, Gemini, and Ollama. For RAG, it supports vector stores such as PGVector, Chroma, and Redis. The ChatClient API is used for communication, and the Advisor API simplifies patterns like RAG.
Step 4: Implement the RAG Controller
The RAG logic is greatly simplified by Spring AI’s Advisor API, specifically QuestionAnswerAdvisor. This advisor automatically performs the retrieval and prompt augmentation before calling the LLM.
Create a REST controller named RagController.java
package com.example.ragtutorial;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
@RestController
public class RagController {
private final ChatClient chatClient;
public RagController(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
// Configure the ChatClient with the QuestionAnswerAdvisor
// The QuestionAnswerAdvisor handles:
// 1. Retrieving relevant documents from the VectorStore based on the user query.
// 2. Augmenting the user's prompt with the retrieved documents as context.
this.chatClient = chatClientBuilder
// This is the core of RAG implementation in Spring AI
.defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
.build();
}
@GetMapping("/rag/query")
public String ragQuery(@RequestParam(defaultValue = "What is Spring AI and what are its features?") String query) {
// The advisor runs before this call, injecting the retrieved context into the prompt
return this.chatClient.prompt()
.user(query)
.call()
.content();
}
}
Step 5: Run and Test the Application
1. Ensure PostgreSQL is running with pgvector enabled (e.g., via Docker).
2. Run the Spring Boot application. The IngestionService will execute upon startup, loading your document into the vector store.
3. Test the RAG endpoint using a browser or a tool like cURL:
Query based on the context:
curl 'http://localhost:8080/rag/query?query=What is the primary goal of Spring AI?'
# Expected Output (grounded in your document): The primary goal of Spring AI is to apply Spring ecosystem design principles to the AI domain and to connect enterprise data and APIs with AI Models.
Conclusion and Shortcomings of RAG
RAG with Spring AI is a powerful and convenient pattern. However, the RAG approach itself, regardless of the framework, has inherent shortcomings:
1. The “Garbage In, Garbage Out” Problem: The quality of the final answer is directly dependent on the quality of the retrieved documents. If the source documents are poorly structured, incomplete, or the chunking is sub-optimal, the LLM will still provide a poor or hallucinated answer.
Fix: Requires a robust ETL pipeline for document cleaning and structured chunking.
1. Need for Fine-Tuning Retrieval: Simple vector similarity search is not always enough.
2. Advanced scenarios require:
- Re-ranking: Using a separate model to re-score the top-K retrieved documents for better relevance.
- Query Transformation: Using the LLM to rewrite the user’s question into multiple, more specific queries to boost recall (MultiQueryExpander in Spring AI).
- Hybrid Search: Combining vector search with traditional keyword search (lexical search) to cover more bases.
1. Context Window Management: The retrieved documents must fit within the LLM’s context window. If too many relevant chunks are found, they must be truncated or summarized, which can lead to incomplete answers.
Integration Complexity (Spring AI Specific): While simple RAG is easy, more complex agentic workflows or highly customized multi-step reasoning often require more explicit configuration than the high-level Advisor abstraction, potentially leading to more code than in a framework designed primarily for chaining (like LangChain4j).

Top comments (0)