Brian Spann

Posted on Feb 27

Semantic Kernel Memory: Vector Stores, Embeddings, and Semantic Search

#dotnet #csharp #semantickernel #ai

LLMs have a fundamental limitation: they're stateless. Every request starts fresh with no memory of previous conversations or your organization's knowledge. This is where Semantic Kernel's memory system comes in—transforming raw text into searchable vector embeddings that give your AI persistent, semantic understanding.

In Part 2, we explored plugins. Now we'll dive deep into the memory layer that powers intelligent retrieval.

Why Memory Matters

Consider a customer support bot. Without memory, it can't:

Remember what the customer said 5 messages ago
Access your product documentation
Know your company's policies
Learn from resolved tickets

With Semantic Kernel memory, you transform unstructured text into vector embeddings—numerical representations that capture semantic meaning. Similar concepts cluster together in vector space, enabling semantic search that understands intent, not just keywords.

Understanding Embeddings

Before diving into code, let's understand what's happening under the hood.

When you send text to an embedding model, it returns a high-dimensional vector (typically 1536 or 3072 dimensions). These vectors have a remarkable property: semantically similar texts produce similar vectors.

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

var texts = new[]
{
    "The cat sat on the mat",
    "A feline rested on the rug",
    "The stock market crashed today",
    "Dogs are loyal companions"
};

var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts);

// Calculate cosine similarity between vectors
float CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
{
    var spanA = a.Span;
    var spanB = b.Span;
    float dot = 0, normA = 0, normB = 0;

    for (int i = 0; i < spanA.Length; i++)
    {
        dot += spanA[i] * spanB[i];
        normA += spanA[i] * spanA[i];
        normB += spanB[i] * spanB[i];
    }

    return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
}

// "cat on mat" vs "feline on rug" → ~0.92 (very similar!)
// "cat on mat" vs "stock market" → ~0.31 (unrelated)
Console.WriteLine($"Cat/Feline similarity: {CosineSimilarity(embeddings[0], embeddings[1]):F2}");
Console.WriteLine($"Cat/Market similarity: {CosineSimilarity(embeddings[0], embeddings[2]):F2}");

Setting Up Memory with ISemanticTextMemory

The ISemanticTextMemory interface provides a simple abstraction for storing and searching memories:

using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;

// Build the memory system
var memoryBuilder = new MemoryBuilder();

// Add embedding generation
memoryBuilder.WithAzureOpenAITextEmbeddingGeneration(
    deploymentName: "text-embedding-3-large",
    endpoint: config["AzureOpenAI:Endpoint"]!,
    apiKey: config["AzureOpenAI:Key"]!);

// Add a memory store (we'll explore options below)
memoryBuilder.WithMemoryStore(new VolatileMemoryStore());

var memory = memoryBuilder.Build();

Storing Memories

// Store individual memories
await memory.SaveInformationAsync(
    collection: "company-policies",
    id: "refund-policy",
    text: "Customers may return any item within 30 days of purchase for a full refund. " +
          "Items must be unused and in original packaging. Digital products are non-refundable.",
    description: "Company refund and return policy",
    additionalMetadata: "department=customer-service,version=2024-01,priority=high");

await memory.SaveInformationAsync(
    collection: "company-policies", 
    id: "shipping-policy",
    text: "Free shipping on orders over $50. Standard shipping takes 5-7 business days. " +
          "Express shipping (2-3 days) available for $9.99. Overnight shipping $24.99.",
    description: "Shipping rates and timeframes",
    additionalMetadata: "department=logistics,version=2024-01");

await memory.SaveInformationAsync(
    collection: "company-policies",
    id: "warranty-policy", 
    text: "All electronics come with a 1-year manufacturer warranty. Extended warranties " +
          "available for purchase. Warranty covers defects, not accidental damage.",
    description: "Product warranty information",
    additionalMetadata: "department=support,version=2024-01");

Semantic Search

Now the magic—semantic search that understands meaning:

// Search for relevant information
var searchResults = memory.SearchAsync(
    collection: "company-policies",
    query: "How long do I have to return something?",  // Note: doesn't contain "refund"
    limit: 3,
    minRelevanceScore: 0.7);

await foreach (var result in searchResults)
{
    Console.WriteLine($"[{result.Relevance:P0}] {result.Metadata.Description}");
    Console.WriteLine($"  {result.Metadata.Text}");
    Console.WriteLine($"  Metadata: {result.Metadata.AdditionalMetadata}");
    Console.WriteLine();
}

// Output:
// [94%] Company refund and return policy
//   Customers may return any item within 30 days of purchase...
//   Metadata: department=customer-service,version=2024-01,priority=high

Memory Store Implementations

Semantic Kernel supports multiple vector stores. Choose based on your requirements:

VolatileMemoryStore (Development)

In-memory storage—fast but ephemeral:

var store = new VolatileMemoryStore();
memoryBuilder.WithMemoryStore(store);

// Great for testing and prototyping
// Data lost when process ends

Azure AI Search (Production Recommended)

Enterprise-grade with hybrid search (vector + keyword):

using Microsoft.SemanticKernel.Connectors.AzureAISearch;

var store = new AzureAISearchMemoryStore(
    endpoint: config["AzureSearch:Endpoint"]!,
    apiKey: config["AzureSearch:Key"]!);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Hybrid search (vector + BM25 keyword)
// - Semantic ranking
// - Faceted filtering
// - Geo-spatial queries
// - Enterprise security (RBAC, private endpoints)

Qdrant (Self-Hosted Performance)

High-performance open-source vector database:

using Microsoft.SemanticKernel.Connectors.Qdrant;

var store = new QdrantMemoryStore(
    host: "localhost",
    port: 6333,
    vectorSize: 3072);  // Match your embedding model

memoryBuilder.WithMemoryStore(store);

// Features:
// - Excellent filtering capabilities
// - Horizontal scaling
// - Snapshot backups
// - gRPC for performance

PostgreSQL with pgvector

Use your existing Postgres infrastructure:

using Microsoft.SemanticKernel.Connectors.Postgres;
using Npgsql;

var dataSource = NpgsqlDataSource.Create(config.GetConnectionString("Postgres")!);
var store = new PostgresMemoryStore(dataSource, vectorSize: 3072);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Familiar SQL ecosystem
// - ACID transactions
// - Complex queries combining vectors with relational data
// - Cost-effective for smaller datasets

Redis (Low-Latency Cache)

When milliseconds matter:

using Microsoft.SemanticKernel.Connectors.Redis;
using StackExchange.Redis;

var redis = ConnectionMultiplexer.Connect(config["Redis:Connection"]!);
var store = new RedisMemoryStore(redis.GetDatabase(), vectorSize: 3072);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Sub-millisecond latency
// - Automatic expiration (TTL)
// - Cluster support
// - Good for session/conversation memory

Comparison Matrix

Store	Best For	Latency	Scaling	Managed	Hybrid Search
Azure AI Search	Production enterprise	Medium	Excellent	Yes	Yes
Qdrant	Self-hosted, filtering	Low	Good	Optional	No
PostgreSQL	Existing Postgres infra	Medium	Good	Optional	Via extensions
Redis	Low-latency, ephemeral	Very Low	Good	Optional	No
Volatile	Development/testing	Instant	N/A	N/A	No

Memory Records Deep Dive

Each memory record contains:

public class MemoryRecord
{
    // Core data
    public string Id { get; }                    // Unique identifier
    public string Text { get; }                  // Original text content
    public ReadOnlyMemory<float> Embedding { get } // Vector representation

    // Metadata
    public MemoryRecordMetadata Metadata { get; }
}

public class MemoryRecordMetadata
{
    public string Id { get; }
    public string Text { get; }
    public string Description { get; }           // Human-readable description
    public string AdditionalMetadata { get; }    // Custom key=value pairs
    public string ExternalSourceName { get; }    // Source system reference
    public bool IsReference { get; }             // Is this a reference to external data?
}

Working with Metadata

Use metadata for filtering and organization:

// Store with rich metadata
await memory.SaveInformationAsync(
    collection: "support-tickets",
    id: $"ticket-{ticket.Id}",
    text: $"Issue: {ticket.Title}\n\nDescription: {ticket.Description}\n\nResolution: {ticket.Resolution}",
    description: $"Resolved ticket: {ticket.Title}",
    additionalMetadata: $"category={ticket.Category},priority={ticket.Priority}," +
                        $"resolved_date={ticket.ResolvedAt:yyyy-MM-dd},agent={ticket.AgentId}");

// Later, search and filter
var results = memory.SearchAsync("customer can't login", limit: 10, minRelevanceScore: 0.75);

await foreach (var result in results)
{
    // Parse metadata for filtering
    var metadata = result.Metadata.AdditionalMetadata
        .Split(',')
        .Select(p => p.Split('='))
        .ToDictionary(p => p[0], p => p[1]);

    if (metadata["category"] == "authentication")
    {
        Console.WriteLine($"Relevant auth ticket: {result.Metadata.Description}");
    }
}

Kernel Memory: Production RAG

For production RAG pipelines, Microsoft.KernelMemory provides a more robust solution:

using Microsoft.KernelMemory;

var kernelMemory = new KernelMemoryBuilder()
    // LLM for summarization and answer generation
    .WithAzureOpenAITextGeneration(new AzureOpenAIConfig
    {
        Deployment = "gpt-4o",
        Endpoint = config["AzureOpenAI:Endpoint"]!,
        APIKey = config["AzureOpenAI:Key"]!,
        APIType = AzureOpenAIConfig.APITypes.ChatCompletion
    })
    // Embedding model
    .WithAzureOpenAITextEmbeddingGeneration(new AzureOpenAIConfig
    {
        Deployment = "text-embedding-3-large",
        Endpoint = config["AzureOpenAI:Endpoint"]!,
        APIKey = config["AzureOpenAI:Key"]!,
        APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration
    })
    // Vector storage
    .WithAzureAISearchMemoryDb(new AzureAISearchConfig
    {
        Endpoint = config["AzureSearch:Endpoint"]!,
        APIKey = config["AzureSearch:Key"]!
    })
    .Build<MemoryServerless>();

Importing Documents

Kernel Memory handles document processing automatically:

// Import a PDF
await kernelMemory.ImportDocumentAsync(
    filePath: "docs/product-manual.pdf",
    documentId: "manual-v2.1",
    tags: new TagCollection
    {
        { "product", "widget-pro" },
        { "version", "2.1" },
        { "type", "manual" }
    });

// Import a web page
await kernelMemory.ImportWebPageAsync(
    url: "https://docs.company.com/api-reference",
    documentId: "api-docs",
    tags: new TagCollection { { "type", "api-documentation" } });

// Import text directly
await kernelMemory.ImportTextAsync(
    text: "Our support hours are Monday-Friday 9am-5pm EST. " +
          "Emergency support available 24/7 for enterprise customers.",
    documentId: "support-hours",
    tags: new TagCollection { { "type", "policy" }, { "department", "support" } });

// Check import status (async processing)
while (!await kernelMemory.IsDocumentReadyAsync("manual-v2.1"))
{
    await Task.Delay(1000);
    Console.WriteLine("Processing document...");
}
Console.WriteLine("Document ready for queries!");

Asking Questions with Citations

var answer = await kernelMemory.AskAsync(
    question: "What are the safety warnings for the Widget Pro?",
    filters: new MemoryFilters().ByTag("product", "widget-pro"));

Console.WriteLine($"Answer: {answer.Result}");
Console.WriteLine("\nSources:");

foreach (var citation in answer.RelevantSources)
{
    Console.WriteLine($"  📄 {citation.SourceName}");
    foreach (var partition in citation.Partitions)
    {
        Console.WriteLine($"     Page {partition.PageNumber}: \"{partition.Text[..Math.Min(100, partition.Text.Length)]}...\"");
        Console.WriteLine($"     Relevance: {partition.Relevance:P0}");
    }
}

// Output:
// Answer: The Widget Pro has the following safety warnings: 1) Do not operate near water...
//
// Sources:
//   📄 product-manual.pdf
//      Page 15: "SAFETY WARNINGS: Do not operate the Widget Pro near water or in humid..."
//      Relevance: 94%

Filtering by Tags

// Only search enterprise documentation
var enterpriseAnswer = await kernelMemory.AskAsync(
    question: "How do I configure SSO?",
    filters: new MemoryFilters()
        .ByTag("audience", "enterprise")
        .ByTag("type", "configuration"));

// Search across specific document versions
var v2Answer = await kernelMemory.AskAsync(
    question: "What's new in this version?",
    filters: new MemoryFilters()
        .ByTag("version", "2.0")
        .ByTag("version", "2.1"));  // OR logic for same tag

// Exclude certain content
var publicAnswer = await kernelMemory.AskAsync(
    question: "What are your pricing tiers?",
    filters: new MemoryFilters()
        .ByTag("visibility", "public"));  // No internal docs

Memory in Conversational Context

For chat applications, combine short-term (conversation) and long-term (knowledge base) memory:

public class ConversationalMemoryService
{
    private readonly ISemanticTextMemory _longTermMemory;
    private readonly IMemoryStore _shortTermStore;
    private readonly Kernel _kernel;

    public async Task<string> ProcessMessageAsync(
        string conversationId, 
        string userMessage)
    {
        // 1. Store the user message in short-term memory
        await _shortTermStore.UpsertAsync(
            collection: $"conversation-{conversationId}",
            record: MemoryRecord.LocalRecord(
                id: Guid.NewGuid().ToString(),
                text: $"User: {userMessage}",
                embedding: await GenerateEmbeddingAsync(userMessage)));

        // 2. Search long-term memory for relevant context
        var relevantKnowledge = await _longTermMemory
            .SearchAsync("knowledge-base", userMessage, limit: 3, minRelevanceScore: 0.75)
            .ToListAsync();

        // 3. Get recent conversation history
        var recentHistory = await GetRecentConversationAsync(conversationId, limit: 10);

        // 4. Build the prompt with both memory types
        var prompt = $"""
            You are a helpful assistant. Use the following context to answer the user's question.

            ## Relevant Knowledge:
            {string.Join("\n\n", relevantKnowledge.Select(r => r.Metadata.Text))}

            ## Recent Conversation:
            {string.Join("\n", recentHistory)}

            ## Current Question:
            {userMessage}

            Answer:
            """;

        var response = await _kernel.InvokePromptAsync<string>(prompt);

        // 5. Store the response in short-term memory
        await _shortTermStore.UpsertAsync(
            collection: $"conversation-{conversationId}",
            record: MemoryRecord.LocalRecord(
                id: Guid.NewGuid().ToString(),
                text: $"Assistant: {response}",
                embedding: await GenerateEmbeddingAsync(response!)));

        return response!;
    }
}

Memory Maintenance

Keep your memory stores healthy:

public class MemoryMaintenanceService
{
    private readonly IMemoryStore _store;

    // Remove outdated memories
    public async Task PruneOldMemoriesAsync(string collection, TimeSpan maxAge)
    {
        var cutoff = DateTime.UtcNow - maxAge;
        var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();

        foreach (var record in allRecords)
        {
            var metadata = ParseMetadata(record.Metadata.AdditionalMetadata);
            if (metadata.TryGetValue("created_at", out var createdStr) &&
                DateTime.Parse(createdStr) < cutoff)
            {
                await _store.RemoveAsync(collection, record.Metadata.Id);
            }
        }
    }

    // Re-embed memories with a new model
    public async Task ReembedCollectionAsync(
        string collection,
        ITextEmbeddingGenerationService oldService,
        ITextEmbeddingGenerationService newService)
    {
        var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();

        foreach (var batch in allRecords.Chunk(100))
        {
            var texts = batch.Select(r => r.Metadata.Text).ToArray();
            var newEmbeddings = await newService.GenerateEmbeddingsAsync(texts);

            for (int i = 0; i < batch.Length; i++)
            {
                var updated = MemoryRecord.LocalRecord(
                    id: batch[i].Metadata.Id,
                    text: batch[i].Metadata.Text,
                    description: batch[i].Metadata.Description,
                    embedding: newEmbeddings[i],
                    additionalMetadata: batch[i].Metadata.AdditionalMetadata);

                await _store.UpsertAsync(collection, updated);
            }
        }
    }
}

What's Next

In this article, we explored Semantic Kernel's memory capabilities:

Embeddings: Transforming text into searchable vectors
ISemanticTextMemory: The abstraction for storing and searching
Memory Stores: Azure AI Search, Qdrant, PostgreSQL, Redis
Kernel Memory: Production-grade document processing with citations
Conversational Memory: Combining short-term and long-term context

In Part 4, we'll put memory to work building production RAG applications—chunking strategies, retrieval patterns, context window management, and evaluation techniques.

This is Part 3 of a 5-part series on Semantic Kernel. Next up: Production RAG Patterns

DEV Community