DEV Community

Allan Roberto
Allan Roberto

Posted on

Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)

In the previous article, we built the indexing pipeline for our knowledge base:

  • documents are saved
  • content is split into chunks
  • embeddings are generated
  • vectors are stored in PostgreSQL using pgvector

But indexing is only half of the system.

The real value comes when users can ask questions and receive answers based on the indexed knowledge.

In this article we will implement the retrieval side of the architecture using Spring Boot.

By the end of this tutorial, our system will support:

  • receiving a user question
  • converting the question into an embedding
  • searching the vector database for similar chunks
  • building a prompt with contextual information
  • sending that prompt to an AI client
  • returning a grounded response

This architecture is commonly known as Retrieval-Augmented Generation (RAG).


Understanding the Retrieval Flow

Once documents are indexed, the query flow looks like this:

User question
   ↓
Convert question into embedding
   ↓
Vector similarity search in PostgreSQL
   ↓
Retrieve most relevant chunks
   ↓
Build prompt with context
   ↓
Send prompt to AI model
   ↓
Return answer
Enter fullscreen mode Exit fullscreen mode

An important detail is that vectors are not sent to the AI model.

Vectors are used only to retrieve the most relevant text.

The AI receives plain text chunks as context.


Step 1 — Question Request DTO

First we define the request used by the semantic search endpoint.

package com.example.knowledgebase.api;

import jakarta.validation.constraints.NotBlank;

public record AskQuestionRequest(
        @NotBlank String question
) {}
Enter fullscreen mode Exit fullscreen mode

Response DTO:

package com.example.knowledgebase.api;

import java.util.List;

public record AskQuestionResponse(
        String question,
        String answer,
        List<String> contextChunks
) {}
Enter fullscreen mode Exit fullscreen mode

Returning the context chunks is useful for debugging and understanding how retrieval works.


Step 2 — Vector Similarity Query

The similarity search happens in the knowledge_document_chunk table.

We extend our repository with a native query using pgvector.

package com.example.knowledgebase.repository;

import com.example.knowledgebase.domain.KnowledgeDocumentChunk;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.query.Param;
import org.springframework.stereotype.Repository;

import java.util.List;

@Repository
public interface KnowledgeDocumentChunkRepository extends JpaRepository<KnowledgeDocumentChunk, Long> {
  List<KnowledgeDocumentChunk> findByDocumentIdOrderByChunkIndexAsc(Long documentId);

  @Query(value = """
      SELECT
          id,
          document_id AS documentId,
          chunk_index AS chunkIndex,
          chunk_text AS chunkText,
          embedding <-> CAST(:embedding AS vector) AS distance
      FROM knowledge_document_chunk
      ORDER BY embedding <-> CAST(:embedding AS vector)
      LIMIT :limit
      """, nativeQuery = true)
  List<SimilarChunkProjection> searchTopK(
      @Param("embedding") String embedding,
      @Param("limit") int limit
  );

  void deleteByDocumentId(Long documentId);
}

Enter fullscreen mode Exit fullscreen mode

This query uses the pgvector distance operator:

embedding <-> query_vector
Enter fullscreen mode Exit fullscreen mode

It returns the nearest vectors first, meaning the most semantically similar chunks.


Step 3 — Projection for Query Results

Instead of loading the full entity, we use a projection.

package com.example.knowledgebase.repository;

public interface SimilarChunkProjection {

    Long getId();

    Long getDocumentId();

    Integer getChunkIndex();

    String getChunkText();

    Double getDistance();
}
Enter fullscreen mode Exit fullscreen mode

This keeps the query lightweight.


Step 4 — Embedding Formatter

PostgreSQL expects vectors formatted like this:

[0.12,0.34,0.98,...]
Enter fullscreen mode Exit fullscreen mode

We add a small helper component.

package com.example.knowledgebase.service;

import org.springframework.stereotype.Component;

@Component
public class VectorFormatter {

    public String toPgVector(float[] embedding) {

        StringBuilder builder = new StringBuilder("[");
        for (int i = 0; i < embedding.length; i++) {

            builder.append(embedding[i]);

            if (i < embedding.length - 1) {
                builder.append(",");
            }
        }

        builder.append("]");

        return builder.toString();
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 5 — Retrieval Service

This service handles semantic retrieval.

package com.example.knowledgebase.service;

import com.example.knowledgebase.repository.KnowledgeDocumentChunkRepository;
import com.example.knowledgebase.repository.SimilarChunkProjection;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
@RequiredArgsConstructor
public class RetrievalService {

    private final EmbeddingService embeddingService;
    private final KnowledgeDocumentChunkRepository chunkRepository;
    private final VectorFormatter vectorFormatter;

    public List<String> retrieveRelevantChunks(String question, int topK) {

        float[] questionEmbedding = embeddingService.generateEmbedding(question);

        String vector = vectorFormatter.toPgVector(questionEmbedding);

        List<SimilarChunkProjection> results =
                chunkRepository.searchTopK(vector, topK);

        return results.stream()
                .map(SimilarChunkProjection::getChunkText)
                .toList();
    }
}
Enter fullscreen mode Exit fullscreen mode

The retrieval logic is straightforward:

  1. embed the question
  2. run similarity search
  3. return the chunk text

Step 6 — Prompt Builder

Next we build the prompt that will be sent to the AI model.

package com.example.knowledgebase.service;

import org.springframework.stereotype.Component;

import java.util.List;

@Component
public class PromptBuilder {

    public String build(String question, List<String> contextChunks) {

        StringBuilder prompt = new StringBuilder();

        prompt.append("""
                You are an assistant for a knowledge base.
                Answer only using the context below.
                If the answer is not present in the context, say you do not know.

                Context:
                """);

        for (int i = 0; i < contextChunks.size(); i++) {

            prompt.append("\n[")
                    .append(i + 1)
                    .append("] ")
                    .append(contextChunks.get(i));
        }

        prompt.append("\n\nUser question:\n");
        prompt.append(question);

        prompt.append("\n\nAnswer:");

        return prompt.toString();
    }
}
Enter fullscreen mode Exit fullscreen mode

This is a key concept of RAG:

the model receives relevant context extracted from your database.


Step 7 — AI Client Abstraction

To keep the architecture flexible, we define an AI client interface.

package com.example.knowledgebase.service;

public interface AiClient {

    String ask(String prompt);

}
Enter fullscreen mode Exit fullscreen mode

Later you could implement this with:

  • OpenAI
  • Azure OpenAI
  • Anthropic
  • Ollama
  • a local LLM

For this tutorial we use a simple mock implementation.


Step 8 — Fake AI Client

package com.example.knowledgebase.service;

import org.springframework.stereotype.Service;

@Service
public class FakeAiClient implements AiClient {

    @Override
    public String ask(String prompt) {

        return """
                Fake AI response.
                In a real system, this prompt would be sent to an LLM provider.
                """;
    }
}
Enter fullscreen mode Exit fullscreen mode

This keeps the tutorial runnable without requiring external APIs.

Step 9 — Semantic Search Service

Now we combine everything into a service.

package com.example.knowledgebase.service;

import com.example.knowledgebase.api.AskQuestionResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.stereotype.Service;

import java.util.List;

@Service
@RequiredArgsConstructor
public class SemanticSearchService {

    private static final int TOP_K = 3;

    private final RetrievalService retrievalService;
    private final PromptBuilder promptBuilder;
    private final AiClient aiClient;

    public AskQuestionResponse ask(String question) {

        List<String> contextChunks =
                retrievalService.retrieveRelevantChunks(question, TOP_K);

        String prompt = promptBuilder.build(question, contextChunks);

        String answer = aiClient.ask(prompt);

        return new AskQuestionResponse(
                question,
                answer,
                contextChunks
        );
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 10 — REST Endpoint

Finally we expose the semantic search endpoint.

package com.example.knowledgebase.api;

import com.example.knowledgebase.service.SemanticSearchService;
import jakarta.validation.Valid;
import lombok.RequiredArgsConstructor;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/questions")
@RequiredArgsConstructor
public class SemanticSearchController {

    private final SemanticSearchService semanticSearchService;

    @PostMapping
    public AskQuestionResponse ask(
            @Valid @RequestBody AskQuestionRequest request
    ) {

        return semanticSearchService.ask(request.question());
    }
}
Enter fullscreen mode Exit fullscreen mode

Now users can ask questions via HTTP.


Testing the Semantic Search

Example request:

POST /questions
Content-Type: application/json
{
  "question": "How does pgvector work with Spring Boot?"
}
Enter fullscreen mode Exit fullscreen mode

Example response:

{
  "question": "How does pgvector work with Spring Boot?",
  "answer": "Fake AI response. In a real system, this prompt would be sent to an LLM provider.",
  "contextChunks": [
    "PostgreSQL can be used as a vector database using pgvector.",
    "Spring Boot can index documents by chunking content.",
    "Embeddings allow semantic similarity search."
  ]
}
Enter fullscreen mode Exit fullscreen mode

Notice how the response includes the context used by the AI.


Why This Architecture Matters

This pattern powers many modern AI systems:

  • internal knowledge assistants
  • AI copilots
  • support automation tools
  • enterprise search platforms

By combining:

  • vector search
  • retrieved context
  • LLM generation

we create systems that produce grounded answers instead of hallucinations.


Final Architecture

At this point our system supports both sides of RAG.

Indexing pipeline

Document
→ Chunking
→ Embeddings
→ Stored in pgvector
Enter fullscreen mode Exit fullscreen mode

Retrieval pipeline

Question
→ Embedding
→ Vector similarity search
→ Context retrieval
→ Prompt building
→ AI response
Enter fullscreen mode Exit fullscreen mode

This is the foundation of many real-world AI applications.


Hot to

Conclusion

In this article we implemented the retrieval layer of a RAG system using Spring Boot and PostgreSQL.

Our application can now:

  • embed user questions
  • perform vector similarity search
  • retrieve relevant document chunks
  • construct contextual prompts
  • generate AI responses

Together with the previous article, we now have a complete knowledge base architecture powered by vector search.

  1. Meaning: How Data Vectorization Powers AI
  2. Turning PostgreSQL Into a Vector Database with Docker
  3. Indexing Knowledge Base Content with Spring Boot and pgvector
  4. Building Semantic Search with Spring Boot, PostgreSQL, and pgvector (RAG Retrieval)

Project Here

Top comments (0)