rinesh

Posted on Mar 22

Building a .NET RAG Application with PostgreSQL pgvector for AI Vector Search

#dotnet #ai #rag #postgres

Semantic search goes beyond keyword matching by understanding the meaning behind a query. It’s a core building block for modern AI use cases like Retrieval-Augmented Generation (RAG), intelligent search, and recommendation systems. Vector databases are essential for these use cases. Unlike traditional databases, they are optimized for storing and querying high-dimensional embedding vectors, enabling efficient similarity search and more context-aware AI applications.

In this blog, we’ll build a .NET console application that performs semantic search using - PostgreSQL + pgvector as the vector database,
Ollama for local embedding and chat models and
EF Core for data access

We’ll generate embeddings from user queries, store them and use them to retrieve contextually relevant results from a vector store.

Prerequisites

1. PostgreSQL with pgvector

You can install PostgreSQL locally or use Docker.

If using Docker, run a pgvector-enabled image and expose port 5432.

docker run -d \
  --name postgres-pgvector \
  -p 5432:5432 \
  ankane/pgvector

After setup, enable the vector extension in your database:

CREATE EXTENSION IF NOT EXISTS vector;

2. Ollama (Local LLM)

Install Ollama from https://ollama.com

Pull required models:

ollama pull nomic-embed-text
ollama pull phi4-mini

Start the Ollama server:

ollama serve

Creating the .NET Console Application

Create a new console app dotnet new console -o VectorSearchApp

Next we can add the required packages, here I'm using EF core and DI for configuring the database access.

dotnet add package Microsoft.Extensions.Hosting
dotnet add package Microsoft.EntityFrameworkCore
dotnet add package Microsoft.EntityFrameworkCore.Design
dotnet add package Npgsql.EntityFrameworkCore.PostgreSQL
dotnet add package Pgvector.EntityFrameworkCore
dotnet add package OllamaSharp

Designing the Data Model

We define a simple entity to store the data

using Pgvector;

public class DesignPatternEntity
{
    public Guid Id { get; set; } = Guid.NewGuid();

    public string Name { get; set; } = default!;

    public string Description { get; set; } = default!;

    public Vector Embedding { get; set; } =default!;
}

The vector embedding is stored as a Vector type from Pgvector library

Configuring the Database Context

Configure DbContext to:

Use PostgreSQL via Npgsql

using Microsoft.EntityFrameworkCore;

public class VectorSearchAppDbContext : DbContext
{
    public VectorSearchAppDbContext(DbContextOptions<VectorSearchAppDbContext> options) : base(options)
    {
    }

    public DbSet<DesignPatternEntity> DesignPatterns => Set<DesignPatternEntity>();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.HasPostgresExtension("vector");

        modelBuilder.Entity<DesignPatternEntity>()
            .Property(x => x.Embedding)
            .HasColumnType("vector(768)"); // match the model dimension
    }
}

{
  "ConnectionStrings": {
    "Default": "Host=localhost;Database=vector_db;Username=rinesh"
  }
}

Next, we’ll set up dependency injection and configure the required services as follows.

var services = new ServiceCollection();

var configuration = new ConfigurationBuilder()
    .SetBasePath(Directory.GetCurrentDirectory())
    .AddJsonFile("appsettings.json", optional: false, reloadOnChange: true)
    .Build();

services.AddDbContext<VectorSearchAppDbContext>(options =>
{
    options.UseNpgsql(
        configuration.GetConnectionString("Default"),
        o => o.UseVector());
});

Now use EF core migration to create you database with the DesignPatterns table

dotnet ef migrations add InitialCreate --output-dir Data/Migrations
dotnet ef database update

Generating Embeddings

To enable semantic search, we convert text into embeddings. Lets seed some data to the table we will use an embedding model from Ollama to generate vector embedding during the data seeding

Process:

Take the description of each record
Generate embeddings using the Ollama embedding model
Store vectors in PostgreSQL

var ollamaApiClient = new OllamaApiClient("http://localhost:11434", "nomic-embed-text");
services.AddSingleton(ollamaApiClient);

var provider = services.BuildServiceProvider();
var dbContext = provider.GetRequiredService<VectorSearchAppDbContext>();
var embeddingGenerator = provider.GetRequiredService<OllamaApiClient>();

if (!dbContext.DesignPatterns.Any())
{
    List<DesignPatternEntity> designPatterns =
    [
        new()
        {
            Name = "API Gateway Pattern",
            Description = "The API Gateway pattern acts as a single entry point for all client requests in a microservices architecture. It handles request routing, composition, authentication, and rate limiting. This pattern helps simplify client interactions and improves security and observability."
        },
        new()
        {
            Name = "CQRS Pattern",
            Description = "Command Query Responsibility Segregation (CQRS) separates read and write operations into different models. This improves scalability and performance, especially in systems with high read and write loads. It is often combined with Event Sourcing."
        },
        new()
        {
            Name = "Event-Driven Architecture",
            Description = "Event-driven architecture enables services to communicate asynchronously using events. Producers emit events without knowing consumers, allowing loose coupling and scalability. This pattern is ideal for distributed systems and real-time processing."
        },
        // Other design patterns with names and descriptions — omitted for brevity.
    ];

    foreach (var designPattern in designPatterns)
    {
        var embedding = await embeddingGenerator.EmbedAsync(designPattern.Description!);

        designPattern.Embedding = new Vector(embedding.Embeddings[0].ToArray());

        dbContext.DesignPatterns.Add(designPattern);
    }

    await dbContext.SaveChangesAsync();
}

Performing Semantic Search

When a user enters a query:

Convert the query into an embedding
Compare it with stored vectors
Retrieve the most similar results

pgvector provides operators for similarity search - cosine distance(<=>), Euclidean Distance (<->) and Dot Product(<#>)

The most commonly used is cosine distance - Lower values indicate higher similarity.
ORDER BY "Vector" <=> query_vector
Pgvector.EntityFrameworkCore package provides EF equivalent for the similarity search operators

SQL	EF
<=>	.CosineDistance()
<->	.L2Distance()
<#>	.MaxInnerProduct()

We will be using CosineDistance for our similarity search.

Console.WriteLine("Enter your question:");
var question = Console.ReadLine();

var questionEmbedding = await ollamaApiClient.EmbedAsync(question!);
var questionVector = new Vector(questionEmbedding.Embeddings[0].ToArray());

var topSimilarResults = await dbContext.DesignPatterns
    .OrderBy(x => x.Embedding.CosineDistance(questionVector))
    .Take(2)
    .ToListAsync();

if (topSimilarResults.Count == 0)
{
    Console.WriteLine("No similar results found");
}

else
{
    Console.WriteLine("Top similar design patterns:");
    foreach (var designPattern in results)
    {
        Console.WriteLine($"{designPattern.Name}: {designPattern.Description}");
    }
}

Enhancing Results with RAG

Instead of returning raw results, we can improve the experience using an LLM.

Steps:

Retrieve relevant data using vector search
Inject that data into a prompt
Ask the LLM to generate a response

This is called Retrieval-Augmented Generation (RAG). It ensures responses are grounded in your actual data.

I’m introducing an additional client to interact with the LLM, using the Phi-4 model.

var ollamaApiChatClient = new OllamaApiClient("http://localhost:11434", "phi4-mini:latest");

var chatPrompt = "You are an expert solution architect.\n";

foreach (var designPattern in results)
{
    chatPrompt += $"Service: {designPattern.Name}\nDescription: {designPattern.Description}\n\n";
}

chatPrompt += $"User Question: {question}";

var chatResponse = await ollamaApiChatClient.GetResponseAsync(chatPrompt);

Console.WriteLine(chatResponse.Text);

In the below screenshot you can see how the application responded to a user query by generating a context aware response based on the retrieved information.