How to Create an Intelligent Document Q&A System Using Spring AI, PostgreSQL, and LM Studio
Imagine having an AI assistant that can instantly answer questions about hundreds of financial documents, quarterly reports, market analyses, policy papers without you having to manually search through pages of text. That's exactly what Retrieval Augmented Generation (RAG) enables, and in this tutorial, we'll build one from scratch using Spring Boot.
By the end of this guide, you'll have a fully functional application that:
- Ingests PDF documents and extracts their content
- Converts text into semantic embeddings using AI models
- Stores embeddings in a PostgreSQL vector database
- Answers natural language queries with contextual accuracy
What is RAG and Why Does It Matter?
Retrieval Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant context from external knowledge sources. Instead of relying solely on the model's training data, RAG systems:
- Retrieve relevant documents based on semantic similarity
- Augment the LLM's prompt with retrieved context
- Generate accurate, grounded answers
This approach is particularly powerful for:
- Enterprise knowledge bases with proprietary information
- Financial document analysis and compliance
- Customer support systems with extensive documentation
- Research paper exploration and literature reviews
Architecture Overview
Our FinanceRag application follows a straightforward yet powerful architecture:
1. Document Ingestion Pipeline
PDF documents are read from the classpath and processed by Spring AI's PagePdfDocumentReader, which extracts text while preserving structure.
2. Text Chunking
The TokenTextSplitter divides the extracted text into manageable chunks (800 tokens each). This is crucial because:
- Embedding models have token limits
- Smaller chunks provide more precise semantic matching
- Context windows in LLMs benefit from focused, relevant information
3. Vector Embedding Generation
Each text chunk is converted into a high-dimensional vector (embedding) using the nomic-embed-text model. These embeddings capture semantic meaning similar concepts cluster together in vector space.
4. Vector Storage with pgvector
Embeddings are persisted in PostgreSQL using the pgvector extension, which enables efficient similarity searches. We use HNSW indexing for fast approximate nearest neighbor (ANN) queries.
5. Query Processing
When a user asks a question:
- The question is embedded using the same model
- Vector similarity search retrieves the most relevant document chunks
- The
QuestionAnswerAdvisoraugments the LLM prompt with this context - The LLM generates a contextual answer
Building the Application: Step by Step
Prerequisites
Before diving into code, ensure you have:
- Java 17+ installed
- PostgreSQL 12+ with pgvector extension
- LM Studio (or another OpenAI-compatible LLM endpoint)
- Maven 3+ for dependency management
Setting Up PostgreSQL with pgvector
You have two options for setting up PostgreSQL:
Option 1: Using Docker (Recommended for Quick Start)
Your repository includes a compose.yaml file for easy setup:
services:
postgres:
image: pgvector/pgvector:pg16
ports:
- "55419:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: finance
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Simply run:
docker-compose up -d
This spins up PostgreSQL with pgvector pre-installed on port 55419.
Option 2: Manual Installation
First, create a database and enable the vector extension:
CREATE DATABASE finance;
\c finance
CREATE EXTENSION IF NOT EXISTS vector;
The pgvector extension adds a new vector data type to PostgreSQL, enabling efficient storage and querying of high-dimensional vectors.
Configuring Spring Boot
Your application.properties file should include:
# Application Name
spring.application.name=finaceRag
# Database Configuration (Docker setup)
spring.datasource.url=jdbc:postgresql://localhost:55419/finance
spring.datasource.username=postgres
spring.datasource.password=postgres
# LLM Configuration (LM Studio)
spring.ai.openai.base-url=http://localhost:1234/
spring.ai.openai.api-key=dummy
# Embedding Model
spring.ai.openai.embedding.options.model=nomic-embed-text
# Chat Model
spring.ai.openai.chat.options.model=google/gemma-3-4b
# Vector Store Configuration
spring.ai.vectorstore.pgvector.initialize-schema=true
# Ingestion Control (IMPORTANT!)
financerag.ingest.enabled=true
Key Configuration Notes:
- Port
55419matches the Docker Compose setup - The
initialize-schema=trueautomatically creates the vector store table -
nomic-embed-textis a lightweight, high-quality embedding model -
google/gemma-3-4bis the chat model served by LM Studio
** Important: Ingestion Control**
The financerag.ingest.enabled property is a smart optimization:
First Run (Initial Setup):
financerag.ingest.enabled=true
This processes your PDFs and populates the vector store.
Subsequent Runs:
financerag.ingest.enabled=false
This skips ingestion and starts the application immediately. The embeddings are already in PostgreSQL, so there's no need to re-process documents every time!
This design prevents:
- Duplicate embeddings in the database
- Slow startup times on every restart
- Unnecessary LLM API calls
Setting Up LM Studio
Download and Install LM Studio from lmstudio.ai
-
Download the Required Models:
- Embedding Model: Search for "nomic-embed-text" in LM Studio and download it
- Chat Model: Search for "google/gemma-3-4b" (or similar) and download it
-
Start the Local Server:
- Open LM Studio
- Go to the "Local Server" tab
- Select your chat model (gemma-3-4b)
- Click "Start Server" (it will run on
http://localhost:1234by default) - Ensure the embedding model is also loaded
Verify the Connection:
curl http://localhost:1234/v1/models
You should see your loaded models listed.
The Ingestion Service
The heart of our document processing pipeline is the IngestionService. Here's how it works:
@Component
@ConditionalOnProperty(
name = "financerag.ingest.enabled",
havingValue = "true",
matchIfMissing = false
)
public class IngestionService implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(IngestionService.class);
private final VectorStore vectorStore;
@Value("classpath:/docs/article.pdf")
private Resource pdfResource;
public IngestionService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
@Override
public void run(String... args) throws Exception {
logger.info("Starting data ingestion process...");
// 1. Read PDF using paragraph-based reader
var pdfReader = new ParagraphPdfDocumentReader(pdfResource);
// 2. Split text into chunks
TextSplitter splitter = new TokenTextSplitter();
// 3. Process and store in vector database
vectorStore.accept(splitter.apply(pdfReader.get()));
logger.info("Vector store updated with PDF content.");
}
}
Key Implementation Insights:
1. Conditional Ingestion
The @ConditionalOnProperty annotation is brilliant - it only runs ingestion when you explicitly enable it:
# Enable ingestion on first run
financerag.ingest.enabled=true
# Disable after initial setup to avoid re-ingesting
financerag.ingest.enabled=false
This prevents re-processing documents on every application restart!
2. CommandLineRunner Interface
By implementing CommandLineRunner, the ingestion happens automatically after Spring Boot starts, but before the application begins serving requests.
3. ParagraphPdfDocumentReader vs PagePdfDocumentReader
Your code uses ParagraphPdfDocumentReader which:
- Preserves document structure better by respecting paragraph boundaries
- Creates more semantically meaningful chunks
- Better suited for financial documents with structured content
4. Simplified API
The vectorStore.accept() method elegantly handles:
- Embedding generation for each chunk
- Batch insertion into PostgreSQL
- All the complexity hidden behind a clean API
The Chat Controller
Now let's expose a REST endpoint for queries:
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder chatClient, PgVectorStore vectorStore) {
this.chatClient = chatClient
.defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
.build();
}
@GetMapping("/chat")
public String chat(@RequestParam String question) {
return chatClient.prompt()
.user(question)
.call()
.content();
}
}
The Magic of QuestionAnswerAdvisor:
The QuestionAnswerAdvisor is where RAG happens. Behind the scenes, it:
- Converts the user's question into an embedding
- Performs a similarity search against the vector store
- Injects the most relevant document chunks into the prompt
- Sends the augmented prompt to the LLM
Key Implementation Details:
- The advisor is built using the builder pattern:
QuestionAnswerAdvisor.builder(vectorStore).build() - Spring AI automatically handles the vector search and context injection
- The controller method is elegantly simple - just pass the question through the chat client
Real World Considerations
Choosing the Right Chunk Size
The 800-token chunk size is a starting point. Consider:
- Smaller chunks (200-400 tokens): Better precision, but may lose context
- Larger chunks (1000-1500 tokens): More context, but less precise matching
Experiment with your specific use case. Financial reports might need larger chunks to preserve numerical context, while FAQs work better with smaller, focused chunks.
Scaling to Production
For production deployments, consider:
- Async ingestion: Move document processing to background jobs
- Caching: Cache embeddings for frequently accessed documents
- Metadata filtering: Add tags (date, category, source) to narrow searches
- Monitoring: Track query latency and similarity scores
Hybrid Search Strategies
Pure vector search isn't always optimal. Combine it with:
- Full-text search: For exact keyword matches
- BM25 ranking: Traditional relevance scoring
- Re-ranking: Use a cross-encoder model to refine top results
Testing Your RAG System
Start the application and test with curl:
curl "http://localhost:8080/chat?question=What%20were%20the%20key%20trends%20in%20Q4%20earnings?"
You should see an answer grounded in your ingested documents. Compare responses with and without RAG to appreciate the difference in accuracy and relevance.
Common Pitfalls and Solutions
1. Embedding Dimension Mismatch
Problem: Embeddings fail to store with dimension errors.
Solution: Ensure spring.ai.vectorstore.pgvector.dimensions matches your embedding model. For nomic-embed-text, use 768.
2. Poor Retrieval Quality
Problem: Answers don't align with document content.
Solution: Adjust chunk size, increase topK, or lower the similarity threshold. Also verify your embedding model is appropriate for your domain.
3. Memory Issues During Ingestion
Problem: Application crashes with OutOfMemoryError.
Solution: Process documents in batches, increase JVM heap size (-Xmx4g), or limit the maxNumChunks parameter.
Extending FinanceRag: Ideas for Enhancement
This project is a foundation. Here are some powerful extensions:
Multi Document Support
Instead of hardcoding a single PDF, scan a directory or accept uploads via REST API. Add metadata (filename, upload date) to enable filtered searches.
Conversational Memory
Implement session based chat history so users can ask follow up questions without repeating context. Spring AI supports this with MessageChatMemoryAdvisor.
Source Attribution
Return not just the answer but citations showing which document chunks were used. This builds trust and allows users to verify information.
Advanced Analytics
Track which documents are queried most frequently, average similarity scores, and query patterns to identify knowledge gaps.
Conclusion
You've now built a production-ready RAG system that can intelligently answer questions about your documents. This architecture scales to thousands of documents and can be adapted for countless use cases such as customer support, legal document analysis, medical research, and more.
The beauty of Spring AI is how it abstracts the complexity of embeddings, vector stores, and LLM orchestration, letting you focus on business logic. With just three components IngestionService, ChatController, and pgvector we've created a powerful AI assistant.
The full source code for FinanceRag is available on GitHub. Clone it, experiment with different models and chunk sizes, and adapt it to your domain. The future of enterprise AI is built on foundations like these combining the power of LLMs with your organization's proprietary knowledge.
Special Thanks: This project was inspired by the excellent Spring AI content from Dan Vega, whose tutorials have helped countless developers understand the power of RAG architectures.
Happy coding, and may your AI assistants always retrieve the right context!
About the Author
This tutorial is brought to you by the Abhijith Rajesh
Links:
Top comments (2)
Great work! π
Really helpfulππ