Allan Roberto

Posted on Mar 15 • Edited on Apr 7

Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)

#ai #java #springboot #postgres

In the previous article, we built the indexing pipeline for our knowledge base:

documents are saved
content is split into chunks
embeddings are generated
vectors are stored in PostgreSQL using pgvector

But indexing is only half of the system.

The real value comes when users can ask questions and receive answers based on the indexed knowledge.

In this article we will implement the retrieval side of the architecture using Spring Boot.

By the end of this tutorial, our system will support:

receiving a user question
converting the question into an embedding
searching the vector database for similar chunks
building a prompt with contextual information
sending that prompt to an AI client
returning a grounded response

This architecture is commonly known as Retrieval-Augmented Generation (RAG).

Understanding the Retrieval Flow

Once documents are indexed, the query flow looks like this:

User question
   ↓
Convert question into embedding
   ↓
Vector similarity search in PostgreSQL
   ↓
Retrieve most relevant chunks
   ↓
Build prompt with context
   ↓
Send prompt to AI model
   ↓
Return answer

An important detail is that vectors are not sent to the AI model.

Vectors are used only to retrieve the most relevant text.

The AI receives plain text chunks as context.

Step 1 — Question Request DTO

First we define the request used by the semantic search endpoint.

package com.example.knowledgebase.api;

import jakarta.validation.constraints.NotBlank;

public record AskQuestionRequest(
        @NotBlank String question
) {}

Response DTO:

package com.example.knowledgebase.api;

import java.util.List;

public record AskQuestionResponse(
        String question,
        String answer,
        List<String> contextChunks
) {}

Returning the context chunks is useful for debugging and understanding how retrieval works.

Step 2 — Vector Similarity Query

The similarity search happens in the knowledge_document_chunk table.

We extend our repository with a native query using pgvector.

package com.example.knowledgebase.repository;

import com.example.knowledgebase.domain.KnowledgeDocumentChunk;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.util.List;

@Repository
public interface KnowledgeDocumentChunkRepository extends JpaRepository<KnowledgeDocumentChunk, Long> {
  List<KnowledgeDocumentChunk> findByDocumentIdOrderByChunkIndexAsc(Long documentId);

  @Query(value = """
      SELECT
          id,
          document_id AS documentId,
          chunk_index AS chunkIndex,
          chunk_text AS chunkText,
          embedding <-> CAST(:embedding AS vector) AS distance
      FROM knowledge_document_chunk
      ORDER BY embedding <-> CAST(:embedding AS vector)
      LIMIT :limit
      """, nativeQuery = true)
  List<SimilarChunkProjection> searchTopK(
      @Param("embedding") String embedding,
      @Param("limit") int limit
  );

  void deleteByDocumentId(Long documentId);
}

This query uses the pgvector distance operator:

embedding <-> query_vector

It returns the nearest vectors first, meaning the most semantically similar chunks.

Step 3 — Projection for Query Results

Instead of loading the full entity, we use a projection.

package com.example.knowledgebase.repository;

public interface SimilarChunkProjection {

    Long getId();

    Long getDocumentId();

    Integer getChunkIndex();

    String getChunkText();

    Double getDistance();
}

This keeps the query lightweight.

Step 4 — Embedding Formatter

PostgreSQL expects vectors formatted like this:

[0.12,0.34,0.98,...]

We add a small helper component.

package com.example.knowledgebase.service;

import org.springframework.stereotype.Component;

@Component
public class VectorFormatter {

    public String toPgVector(float[] embedding) {

        StringBuilder builder = new StringBuilder("[");
        for (int i = 0; i < embedding.length; i++) {

            builder.append(embedding[i]);

            if (i < embedding.length - 1) {
                builder.append(",");
            }
        }

        builder.append("]");

        return builder.toString();
    }
}

Step 5 — Retrieval Service

This service handles semantic retrieval.

package com.example.knowledgebase.service;

import com.example.knowledgebase.repository.KnowledgeDocumentChunkRepository;
import com.example.knowledgebase.repository.SimilarChunkProjection;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
@RequiredArgsConstructor
public class RetrievalService {

    private final EmbeddingService embeddingService;
    private final KnowledgeDocumentChunkRepository chunkRepository;
    private final VectorFormatter vectorFormatter;

    public List<String> retrieveRelevantChunks(String question, int topK) {

        float[] questionEmbedding = embeddingService.generateEmbedding(question);

        String vector = vectorFormatter.toPgVector(questionEmbedding);

        List<SimilarChunkProjection> results =
                chunkRepository.searchTopK(vector, topK);

        return results.stream()
                .map(SimilarChunkProjection::getChunkText)
                .toList();
    }
}

The retrieval logic is straightforward:

embed the question
run similarity search
return the chunk text

Step 6 — Prompt Builder

Next we build the prompt that will be sent to the AI model.

package com.example.knowledgebase.service;

import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class PromptBuilder {

    public String build(String question, List<String> contextChunks) {

        StringBuilder prompt = new StringBuilder();

        prompt.append("""
                You are an assistant for a knowledge base.
                Answer only using the context below.
                If the answer is not present in the context, say you do not know.

                Context:
                """);

        for (int i = 0; i < contextChunks.size(); i++) {

            prompt.append("\n[")
                    .append(i + 1)
                    .append("] ")
                    .append(contextChunks.get(i));
        }

        prompt.append("\n\nUser question:\n");
        prompt.append(question);

        prompt.append("\n\nAnswer:");

        return prompt.toString();
    }
}

This is a key concept of RAG:

the model receives relevant context extracted from your database.

Step 7 — AI Client Abstraction

To keep the architecture flexible, we define an AI client interface.

package com.example.knowledgebase.service;

public interface AiClient {

    String ask(String prompt);

}

Later you could implement this with:

OpenAI
Azure OpenAI
Anthropic
Ollama
a local LLM

For this tutorial we use a simple mock implementation.

Step 8 — Fake AI Client

package com.example.knowledgebase.service;

import org.springframework.stereotype.Service;

@Service
public class FakeAiClient implements AiClient {

    @Override
    public String ask(String prompt) {

        return """
                Fake AI response.
                In a real system, this prompt would be sent to an LLM provider.
                """;
    }
}

This keeps the tutorial runnable without requiring external APIs.

Step 9 — Semantic Search Service

Now we combine everything into a service.

package com.example.knowledgebase.service;

import com.example.knowledgebase.api.AskQuestionResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
@RequiredArgsConstructor
public class SemanticSearchService {

    private static final int TOP_K = 3;

    private final RetrievalService retrievalService;
    private final PromptBuilder promptBuilder;
    private final AiClient aiClient;

    public AskQuestionResponse ask(String question) {

        List<String> contextChunks =
                retrievalService.retrieveRelevantChunks(question, TOP_K);

        String prompt = promptBuilder.build(question, contextChunks);

        String answer = aiClient.ask(prompt);

        return new AskQuestionResponse(
                question,
                answer,
                contextChunks
        );
    }
}

Step 10 — REST Endpoint

Finally we expose the semantic search endpoint.

package com.example.knowledgebase.api;

import com.example.knowledgebase.service.SemanticSearchService;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/questions")
@RequiredArgsConstructor
public class SemanticSearchController {

    private final SemanticSearchService semanticSearchService;

    @PostMapping
    public AskQuestionResponse ask(
            @Valid @RequestBody AskQuestionRequest request
    ) {

        return semanticSearchService.ask(request.question());
    }
}

Now users can ask questions via HTTP.

Testing the Semantic Search

Example request:

POST /questions
Content-Type: application/json
{
  "question": "How does pgvector work with Spring Boot?"
}

Example response:

{
  "question": "How does pgvector work with Spring Boot?",
  "answer": "Fake AI response. In a real system, this prompt would be sent to an LLM provider.",
  "contextChunks": [
    "PostgreSQL can be used as a vector database using pgvector.",
    "Spring Boot can index documents by chunking content.",
    "Embeddings allow semantic similarity search."
  ]
}

Notice how the response includes the context used by the AI.

Why This Architecture Matters

This pattern powers many modern AI systems:

internal knowledge assistants
AI copilots
support automation tools
enterprise search platforms

By combining:

vector search
retrieved context
LLM generation

we create systems that produce grounded answers instead of hallucinations.

Final Architecture

At this point our system supports both sides of RAG.

Indexing pipeline

Document
→ Chunking
→ Embeddings
→ Stored in pgvector

Retrieval pipeline

Question
→ Embedding
→ Vector similarity search
→ Context retrieval
→ Prompt building
→ AI response

This is the foundation of many real-world AI applications.

Hot to

Conclusion

In this article we implemented the retrieval layer of a RAG system using Spring Boot and PostgreSQL.

Our application can now:

embed user questions
perform vector similarity search
retrieve relevant document chunks
construct contextual prompts
generate AI responses

Together with the previous article, we now have a complete knowledge base architecture powered by vector search.

Meaning: How Data Vectorization Powers AI
Turning PostgreSQL Into a Vector Database with Docker
Indexing Knowledge Base Content with Spring Boot and pgvector
Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)
How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Project Here

DEV Community