In the previous article, we built the indexing pipeline for our knowledge base:
- documents are saved
- content is split into chunks
- embeddings are generated
- vectors are stored in PostgreSQL using pgvector
But indexing is only half of the system.
The real value comes when users can ask questions and receive answers based on the indexed knowledge.
In this article we will implement the retrieval side of the architecture using Spring Boot.
By the end of this tutorial, our system will support:
- receiving a user question
- converting the question into an embedding
- searching the vector database for similar chunks
- building a prompt with contextual information
- sending that prompt to an AI client
- returning a grounded response
This architecture is commonly known as Retrieval-Augmented Generation (RAG).
Understanding the Retrieval Flow
Once documents are indexed, the query flow looks like this:
User question
↓
Convert question into embedding
↓
Vector similarity search in PostgreSQL
↓
Retrieve most relevant chunks
↓
Build prompt with context
↓
Send prompt to AI model
↓
Return answer
An important detail is that vectors are not sent to the AI model.
Vectors are used only to retrieve the most relevant text.
The AI receives plain text chunks as context.
Step 1 — Question Request DTO
First we define the request used by the semantic search endpoint.
package com.example.knowledgebase.api;
import jakarta.validation.constraints.NotBlank;
public record AskQuestionRequest(
@NotBlank String question
) {}
Response DTO:
package com.example.knowledgebase.api;
import java.util.List;
public record AskQuestionResponse(
String question,
String answer,
List<String> contextChunks
) {}
Returning the context chunks is useful for debugging and understanding how retrieval works.
Step 2 — Vector Similarity Query
The similarity search happens in the knowledge_document_chunk table.
We extend our repository with a native query using pgvector.
package com.example.knowledgebase.repository;
import com.example.knowledgebase.domain.KnowledgeDocumentChunk;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;
import java.util.List;
@Repository
public interface KnowledgeDocumentChunkRepository extends JpaRepository<KnowledgeDocumentChunk, Long> {
List<KnowledgeDocumentChunk> findByDocumentIdOrderByChunkIndexAsc(Long documentId);
@Query(value = """
SELECT
id,
document_id AS documentId,
chunk_index AS chunkIndex,
chunk_text AS chunkText,
embedding <-> CAST(:embedding AS vector) AS distance
FROM knowledge_document_chunk
ORDER BY embedding <-> CAST(:embedding AS vector)
LIMIT :limit
""", nativeQuery = true)
List<SimilarChunkProjection> searchTopK(
@Param("embedding") String embedding,
@Param("limit") int limit
);
void deleteByDocumentId(Long documentId);
}
This query uses the pgvector distance operator:
embedding <-> query_vector
It returns the nearest vectors first, meaning the most semantically similar chunks.
Step 3 — Projection for Query Results
Instead of loading the full entity, we use a projection.
package com.example.knowledgebase.repository;
public interface SimilarChunkProjection {
Long getId();
Long getDocumentId();
Integer getChunkIndex();
String getChunkText();
Double getDistance();
}
This keeps the query lightweight.
Step 4 — Embedding Formatter
PostgreSQL expects vectors formatted like this:
[0.12,0.34,0.98,...]
We add a small helper component.
package com.example.knowledgebase.service;
import org.springframework.stereotype.Component;
@Component
public class VectorFormatter {
public String toPgVector(float[] embedding) {
StringBuilder builder = new StringBuilder("[");
for (int i = 0; i < embedding.length; i++) {
builder.append(embedding[i]);
if (i < embedding.length - 1) {
builder.append(",");
}
}
builder.append("]");
return builder.toString();
}
}
Step 5 — Retrieval Service
This service handles semantic retrieval.
package com.example.knowledgebase.service;
import com.example.knowledgebase.repository.KnowledgeDocumentChunkRepository;
import com.example.knowledgebase.repository.SimilarChunkProjection;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
@RequiredArgsConstructor
public class RetrievalService {
private final EmbeddingService embeddingService;
private final KnowledgeDocumentChunkRepository chunkRepository;
private final VectorFormatter vectorFormatter;
public List<String> retrieveRelevantChunks(String question, int topK) {
float[] questionEmbedding = embeddingService.generateEmbedding(question);
String vector = vectorFormatter.toPgVector(questionEmbedding);
List<SimilarChunkProjection> results =
chunkRepository.searchTopK(vector, topK);
return results.stream()
.map(SimilarChunkProjection::getChunkText)
.toList();
}
}
The retrieval logic is straightforward:
- embed the question
- run similarity search
- return the chunk text
Step 6 — Prompt Builder
Next we build the prompt that will be sent to the AI model.
package com.example.knowledgebase.service;
import org.springframework.stereotype.Component;
import java.util.List;
@Component
public class PromptBuilder {
public String build(String question, List<String> contextChunks) {
StringBuilder prompt = new StringBuilder();
prompt.append("""
You are an assistant for a knowledge base.
Answer only using the context below.
If the answer is not present in the context, say you do not know.
Context:
""");
for (int i = 0; i < contextChunks.size(); i++) {
prompt.append("\n[")
.append(i + 1)
.append("] ")
.append(contextChunks.get(i));
}
prompt.append("\n\nUser question:\n");
prompt.append(question);
prompt.append("\n\nAnswer:");
return prompt.toString();
}
}
This is a key concept of RAG:
the model receives relevant context extracted from your database.
Step 7 — AI Client Abstraction
To keep the architecture flexible, we define an AI client interface.
package com.example.knowledgebase.service;
public interface AiClient {
String ask(String prompt);
}
Later you could implement this with:
- OpenAI
- Azure OpenAI
- Anthropic
- Ollama
- a local LLM
For this tutorial we use a simple mock implementation.
Step 8 — Fake AI Client
package com.example.knowledgebase.service;
import org.springframework.stereotype.Service;
@Service
public class FakeAiClient implements AiClient {
@Override
public String ask(String prompt) {
return """
Fake AI response.
In a real system, this prompt would be sent to an LLM provider.
""";
}
}
This keeps the tutorial runnable without requiring external APIs.
Step 9 — Semantic Search Service
Now we combine everything into a service.
package com.example.knowledgebase.service;
import com.example.knowledgebase.api.AskQuestionResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;
import java.util.List;
@Service
@RequiredArgsConstructor
public class SemanticSearchService {
private static final int TOP_K = 3;
private final RetrievalService retrievalService;
private final PromptBuilder promptBuilder;
private final AiClient aiClient;
public AskQuestionResponse ask(String question) {
List<String> contextChunks =
retrievalService.retrieveRelevantChunks(question, TOP_K);
String prompt = promptBuilder.build(question, contextChunks);
String answer = aiClient.ask(prompt);
return new AskQuestionResponse(
question,
answer,
contextChunks
);
}
}
Step 10 — REST Endpoint
Finally we expose the semantic search endpoint.
package com.example.knowledgebase.api;
import com.example.knowledgebase.service.SemanticSearchService;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/questions")
@RequiredArgsConstructor
public class SemanticSearchController {
private final SemanticSearchService semanticSearchService;
@PostMapping
public AskQuestionResponse ask(
@Valid @RequestBody AskQuestionRequest request
) {
return semanticSearchService.ask(request.question());
}
}
Now users can ask questions via HTTP.
Testing the Semantic Search
Example request:
POST /questions
Content-Type: application/json
{
"question": "How does pgvector work with Spring Boot?"
}
Example response:
{
"question": "How does pgvector work with Spring Boot?",
"answer": "Fake AI response. In a real system, this prompt would be sent to an LLM provider.",
"contextChunks": [
"PostgreSQL can be used as a vector database using pgvector.",
"Spring Boot can index documents by chunking content.",
"Embeddings allow semantic similarity search."
]
}
Notice how the response includes the context used by the AI.
Why This Architecture Matters
This pattern powers many modern AI systems:
- internal knowledge assistants
- AI copilots
- support automation tools
- enterprise search platforms
By combining:
- vector search
- retrieved context
- LLM generation
we create systems that produce grounded answers instead of hallucinations.
Final Architecture
At this point our system supports both sides of RAG.
Indexing pipeline
Document
→ Chunking
→ Embeddings
→ Stored in pgvector
Retrieval pipeline
Question
→ Embedding
→ Vector similarity search
→ Context retrieval
→ Prompt building
→ AI response
This is the foundation of many real-world AI applications.
Hot to
Conclusion
In this article we implemented the retrieval layer of a RAG system using Spring Boot and PostgreSQL.
Our application can now:
- embed user questions
- perform vector similarity search
- retrieve relevant document chunks
- construct contextual prompts
- generate AI responses
Together with the previous article, we now have a complete knowledge base architecture powered by vector search.
- Meaning: How Data Vectorization Powers AI
- Turning PostgreSQL Into a Vector Database with Docker
- Indexing Knowledge Base Content with Spring Boot and pgvector
- Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)
Top comments (0)