pgvector + Ollama Setup

#java #langchain4j #ollama #postgres

RAG Without the Chatbot: pgvector + Ollama for Operational Data

Most RAG tutorials start with "upload a PDF and ask questions about it." That's fine for document search. But I needed RAG for something different: diagnosing failures in a distributed system by searching through historical saga events.

No PDFs. No chatbot. Just a Kafka consumer that vectorizes every saga event into pgvector and an agent that searches similar past incidents to diagnose new failures.

This series covers how I built it. The stack is Ollama for local embeddings, pgvector on PostgreSQL for storage, and LangChain4j to tie it together.

Why RAG (and Not Just Logs)

My saga orchestrator processes orders across 5 microservices. When a saga fails, the event carries a full history: which services ran, what status each returned, what error messages were generated. This data lives in Kafka and MongoDB.

I could search logs. But logs are text. Searching "payment failed" gives you exact matches. It doesn't find incidents where the payment was blocked for a different reason but with a similar pattern: same customer type, similar amount, same time of day.

RAG with vector search finds similar incidents, not exact matches. You convert the failure description to a vector (a list of numbers). You store it alongside thousands of past incidents. When a new failure arrives, you search for the closest vectors. The results are incidents that look like the current one, even if the words are different.

The Stack

Component	Tool	Role
Embedding model	Ollama + `nomic-embed-text`	Converts text to 768-dimensional vectors
Vector store	pgvector on PostgreSQL	Stores and searches vectors
Application	LangChain4j	Connects embedding model to vector store

I chose this stack because it runs locally with no cloud dependencies. Ollama is free. pgvector is a PostgreSQL extension, so it uses the same database infrastructure I already have. No separate vector database to manage.

Setting Up pgvector

pgvector is a PostgreSQL extension. The easiest way to run it is with the official Docker image:

# docker-compose.yml
vectors-db:
  image: pgvector/pgvector:pg16
  environment:
    POSTGRES_DB: vectors-db
    POSTGRES_USER: postgres
    POSTGRES_PASSWORD: postgres
  ports:
    - "5435:5432"
  volumes:
    - ./ai-saga-agent/src/main/resources/init-vectors.sql:/docker-entrypoint-initdb.d/init.sql

The init script enables the extension:

CREATE EXTENSION IF NOT EXISTS vector;

That's it. PostgreSQL now supports vector columns and similarity search.

Setting Up Ollama

Install Ollama and pull the embedding model:

brew install ollama
ollama pull nomic-embed-text    # 274MB, 768-dimensional output
ollama serve                    # API at http://localhost:11434

nomic-embed-text is a good default for production. It's small (274MB vs 4GB+ for chat models), fast (single-digit milliseconds per embedding), and the quality is solid for operational data like log messages and event histories.

LangChain4j Configuration

Three beans connect everything:

@Configuration
public class EmbeddingConfig {

    @Value("${ai.ollama.base-url}")
    private String ollamaUrl;

    @Bean
    public EmbeddingModel embeddingModel() {
        return OllamaEmbeddingModel.builder()
            .baseUrl(ollamaUrl)
            .modelName("nomic-embed-text")
            .build();
    }

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore(DataSource dataSource) {
        return PgVectorEmbeddingStore.datasourceBuilder()
            .datasource(dataSource)
            .table("saga_history_embeddings")
            .dimension(768)       // nomic-embed-text output size
            .createTable(true)    // auto-creates if not exists
            .build();
    }

    @Bean
    public EmbeddingStoreIngestor ingestor(
            EmbeddingModel model, EmbeddingStore<TextSegment> store) {
        return EmbeddingStoreIngestor.builder()
            .embeddingModel(model)
            .embeddingStore(store)
            .build();
    }
}

The EmbeddingModel connects to Ollama. The EmbeddingStore connects to pgvector. The EmbeddingStoreIngestor is a convenience that handles the embed-and-store pipeline.

The dimension(768) parameter must match the output size of your embedding model. nomic-embed-text produces 768-dimensional vectors. If you switch models, update this number.

createTable(true) auto-creates the saga_history_embeddings table with the right schema on first run. In production, you'd manage this with migrations.

How Embedding Works

When you embed a piece of text, the model converts it to a list of 768 numbers. Similar texts produce similar numbers. "Payment failed due to insufficient funds" and "Card declined for low balance" will have vectors that are close together in 768-dimensional space, even though they share few words.

In code:

// Text in
String text = "PAYMENT_SERVICE [ROLLBACK]: Fail to realize payment: " +
              "New customer limit exceeded: R$450.00 > R$500.00";

// Vector out (768 floats)
Embedding embedding = embeddingModel.embed(text).content();

// Store with metadata
var segment = TextSegment.from(text, new Metadata()
    .put("orderId", "abc123")
    .put("status", "FAIL")
    .put("profileKey", "new:high-value"));

embeddingStore.add(embedding, segment);

The Metadata lets you filter results later. You could search only for incidents with status=FAIL or profileKey=new:high-value.

Searching for Similar Incidents

To find past incidents similar to a new failure:

String newFailure = "PAYMENT_SERVICE [ROLLBACK]: Transaction blocked by fraud prevention";

var queryEmbedding = embeddingModel.embed(newFailure).content();
var results = embeddingStore.search(
    EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .maxResults(3)
        .minScore(0.75)
        .build());

for (var match : results.matches()) {
    System.out.println("Score: " + match.score());
    System.out.println("Text: " + match.embedded().text());
}

minScore(0.75) filters out weak matches. I found that scores below 0.7 tend to be noise in my data. Scores above 0.85 are usually the same type of failure.

maxResults(3) keeps the RAG context manageable. The LLM performs better with 3 highly relevant examples than 10 mediocre ones.

What's Next

The embedding and search infrastructure is ready. In the next post, I'll show how I feed it with real-time Kafka events: every saga completion gets vectorized, building up a knowledge base that improves the agent's diagnoses over time.

The repo: github.com/pedrop3/saga-orchestration