DEV Community

Cover image for Semantic Kernel Memory: Vector Stores, Embeddings, and Semantic Search
Brian Spann
Brian Spann

Posted on

Semantic Kernel Memory: Vector Stores, Embeddings, and Semantic Search

LLMs have a fundamental limitation: they're stateless. Every request starts fresh with no memory of previous conversations or your organization's knowledge. This is where Semantic Kernel's memory system comes in—transforming raw text into searchable vector embeddings that give your AI persistent, semantic understanding.

In Part 2, we explored plugins. Now we'll dive deep into the memory layer that powers intelligent retrieval.

Why Memory Matters

Consider a customer support bot. Without memory, it can't:

  • Remember what the customer said 5 messages ago
  • Access your product documentation
  • Know your company's policies
  • Learn from resolved tickets

With Semantic Kernel memory, you transform unstructured text into vector embeddings—numerical representations that capture semantic meaning. Similar concepts cluster together in vector space, enabling semantic search that understands intent, not just keywords.

Understanding Embeddings

Before diving into code, let's understand what's happening under the hood.

When you send text to an embedding model, it returns a high-dimensional vector (typically 1536 or 3072 dimensions). These vectors have a remarkable property: semantically similar texts produce similar vectors.

var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

var texts = new[]
{
    "The cat sat on the mat",
    "A feline rested on the rug",
    "The stock market crashed today",
    "Dogs are loyal companions"
};

var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts);

// Calculate cosine similarity between vectors
float CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
{
    var spanA = a.Span;
    var spanB = b.Span;
    float dot = 0, normA = 0, normB = 0;

    for (int i = 0; i < spanA.Length; i++)
    {
        dot += spanA[i] * spanB[i];
        normA += spanA[i] * spanA[i];
        normB += spanB[i] * spanB[i];
    }

    return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
}

// "cat on mat" vs "feline on rug" → ~0.92 (very similar!)
// "cat on mat" vs "stock market" → ~0.31 (unrelated)
Console.WriteLine($"Cat/Feline similarity: {CosineSimilarity(embeddings[0], embeddings[1]):F2}");
Console.WriteLine($"Cat/Market similarity: {CosineSimilarity(embeddings[0], embeddings[2]):F2}");
Enter fullscreen mode Exit fullscreen mode

Setting Up Memory with ISemanticTextMemory

The ISemanticTextMemory interface provides a simple abstraction for storing and searching memories:

using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;

// Build the memory system
var memoryBuilder = new MemoryBuilder();

// Add embedding generation
memoryBuilder.WithAzureOpenAITextEmbeddingGeneration(
    deploymentName: "text-embedding-3-large",
    endpoint: config["AzureOpenAI:Endpoint"]!,
    apiKey: config["AzureOpenAI:Key"]!);

// Add a memory store (we'll explore options below)
memoryBuilder.WithMemoryStore(new VolatileMemoryStore());

var memory = memoryBuilder.Build();
Enter fullscreen mode Exit fullscreen mode

Storing Memories

// Store individual memories
await memory.SaveInformationAsync(
    collection: "company-policies",
    id: "refund-policy",
    text: "Customers may return any item within 30 days of purchase for a full refund. " +
          "Items must be unused and in original packaging. Digital products are non-refundable.",
    description: "Company refund and return policy",
    additionalMetadata: "department=customer-service,version=2024-01,priority=high");

await memory.SaveInformationAsync(
    collection: "company-policies", 
    id: "shipping-policy",
    text: "Free shipping on orders over $50. Standard shipping takes 5-7 business days. " +
          "Express shipping (2-3 days) available for $9.99. Overnight shipping $24.99.",
    description: "Shipping rates and timeframes",
    additionalMetadata: "department=logistics,version=2024-01");

await memory.SaveInformationAsync(
    collection: "company-policies",
    id: "warranty-policy", 
    text: "All electronics come with a 1-year manufacturer warranty. Extended warranties " +
          "available for purchase. Warranty covers defects, not accidental damage.",
    description: "Product warranty information",
    additionalMetadata: "department=support,version=2024-01");
Enter fullscreen mode Exit fullscreen mode

Semantic Search

Now the magic—semantic search that understands meaning:

// Search for relevant information
var searchResults = memory.SearchAsync(
    collection: "company-policies",
    query: "How long do I have to return something?",  // Note: doesn't contain "refund"
    limit: 3,
    minRelevanceScore: 0.7);

await foreach (var result in searchResults)
{
    Console.WriteLine($"[{result.Relevance:P0}] {result.Metadata.Description}");
    Console.WriteLine($"  {result.Metadata.Text}");
    Console.WriteLine($"  Metadata: {result.Metadata.AdditionalMetadata}");
    Console.WriteLine();
}

// Output:
// [94%] Company refund and return policy
//   Customers may return any item within 30 days of purchase...
//   Metadata: department=customer-service,version=2024-01,priority=high
Enter fullscreen mode Exit fullscreen mode

Memory Store Implementations

Semantic Kernel supports multiple vector stores. Choose based on your requirements:

VolatileMemoryStore (Development)

In-memory storage—fast but ephemeral:

var store = new VolatileMemoryStore();
memoryBuilder.WithMemoryStore(store);

// Great for testing and prototyping
// Data lost when process ends
Enter fullscreen mode Exit fullscreen mode

Azure AI Search (Production Recommended)

Enterprise-grade with hybrid search (vector + keyword):

using Microsoft.SemanticKernel.Connectors.AzureAISearch;

var store = new AzureAISearchMemoryStore(
    endpoint: config["AzureSearch:Endpoint"]!,
    apiKey: config["AzureSearch:Key"]!);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Hybrid search (vector + BM25 keyword)
// - Semantic ranking
// - Faceted filtering
// - Geo-spatial queries
// - Enterprise security (RBAC, private endpoints)
Enter fullscreen mode Exit fullscreen mode

Qdrant (Self-Hosted Performance)

High-performance open-source vector database:

using Microsoft.SemanticKernel.Connectors.Qdrant;

var store = new QdrantMemoryStore(
    host: "localhost",
    port: 6333,
    vectorSize: 3072);  // Match your embedding model

memoryBuilder.WithMemoryStore(store);

// Features:
// - Excellent filtering capabilities
// - Horizontal scaling
// - Snapshot backups
// - gRPC for performance
Enter fullscreen mode Exit fullscreen mode

PostgreSQL with pgvector

Use your existing Postgres infrastructure:

using Microsoft.SemanticKernel.Connectors.Postgres;
using Npgsql;

var dataSource = NpgsqlDataSource.Create(config.GetConnectionString("Postgres")!);
var store = new PostgresMemoryStore(dataSource, vectorSize: 3072);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Familiar SQL ecosystem
// - ACID transactions
// - Complex queries combining vectors with relational data
// - Cost-effective for smaller datasets
Enter fullscreen mode Exit fullscreen mode

Redis (Low-Latency Cache)

When milliseconds matter:

using Microsoft.SemanticKernel.Connectors.Redis;
using StackExchange.Redis;

var redis = ConnectionMultiplexer.Connect(config["Redis:Connection"]!);
var store = new RedisMemoryStore(redis.GetDatabase(), vectorSize: 3072);

memoryBuilder.WithMemoryStore(store);

// Features:
// - Sub-millisecond latency
// - Automatic expiration (TTL)
// - Cluster support
// - Good for session/conversation memory
Enter fullscreen mode Exit fullscreen mode

Comparison Matrix

Store Best For Latency Scaling Managed Hybrid Search
Azure AI Search Production enterprise Medium Excellent Yes Yes
Qdrant Self-hosted, filtering Low Good Optional No
PostgreSQL Existing Postgres infra Medium Good Optional Via extensions
Redis Low-latency, ephemeral Very Low Good Optional No
Volatile Development/testing Instant N/A N/A No

Memory Records Deep Dive

Each memory record contains:

public class MemoryRecord
{
    // Core data
    public string Id { get; }                    // Unique identifier
    public string Text { get; }                  // Original text content
    public ReadOnlyMemory<float> Embedding { get } // Vector representation

    // Metadata
    public MemoryRecordMetadata Metadata { get; }
}

public class MemoryRecordMetadata
{
    public string Id { get; }
    public string Text { get; }
    public string Description { get; }           // Human-readable description
    public string AdditionalMetadata { get; }    // Custom key=value pairs
    public string ExternalSourceName { get; }    // Source system reference
    public bool IsReference { get; }             // Is this a reference to external data?
}
Enter fullscreen mode Exit fullscreen mode

Working with Metadata

Use metadata for filtering and organization:

// Store with rich metadata
await memory.SaveInformationAsync(
    collection: "support-tickets",
    id: $"ticket-{ticket.Id}",
    text: $"Issue: {ticket.Title}\n\nDescription: {ticket.Description}\n\nResolution: {ticket.Resolution}",
    description: $"Resolved ticket: {ticket.Title}",
    additionalMetadata: $"category={ticket.Category},priority={ticket.Priority}," +
                        $"resolved_date={ticket.ResolvedAt:yyyy-MM-dd},agent={ticket.AgentId}");

// Later, search and filter
var results = memory.SearchAsync("customer can't login", limit: 10, minRelevanceScore: 0.75);

await foreach (var result in results)
{
    // Parse metadata for filtering
    var metadata = result.Metadata.AdditionalMetadata
        .Split(',')
        .Select(p => p.Split('='))
        .ToDictionary(p => p[0], p => p[1]);

    if (metadata["category"] == "authentication")
    {
        Console.WriteLine($"Relevant auth ticket: {result.Metadata.Description}");
    }
}
Enter fullscreen mode Exit fullscreen mode

Kernel Memory: Production RAG

For production RAG pipelines, Microsoft.KernelMemory provides a more robust solution:

using Microsoft.KernelMemory;

var kernelMemory = new KernelMemoryBuilder()
    // LLM for summarization and answer generation
    .WithAzureOpenAITextGeneration(new AzureOpenAIConfig
    {
        Deployment = "gpt-4o",
        Endpoint = config["AzureOpenAI:Endpoint"]!,
        APIKey = config["AzureOpenAI:Key"]!,
        APIType = AzureOpenAIConfig.APITypes.ChatCompletion
    })
    // Embedding model
    .WithAzureOpenAITextEmbeddingGeneration(new AzureOpenAIConfig
    {
        Deployment = "text-embedding-3-large",
        Endpoint = config["AzureOpenAI:Endpoint"]!,
        APIKey = config["AzureOpenAI:Key"]!,
        APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration
    })
    // Vector storage
    .WithAzureAISearchMemoryDb(new AzureAISearchConfig
    {
        Endpoint = config["AzureSearch:Endpoint"]!,
        APIKey = config["AzureSearch:Key"]!
    })
    .Build<MemoryServerless>();
Enter fullscreen mode Exit fullscreen mode

Importing Documents

Kernel Memory handles document processing automatically:

// Import a PDF
await kernelMemory.ImportDocumentAsync(
    filePath: "docs/product-manual.pdf",
    documentId: "manual-v2.1",
    tags: new TagCollection
    {
        { "product", "widget-pro" },
        { "version", "2.1" },
        { "type", "manual" }
    });

// Import a web page
await kernelMemory.ImportWebPageAsync(
    url: "https://docs.company.com/api-reference",
    documentId: "api-docs",
    tags: new TagCollection { { "type", "api-documentation" } });

// Import text directly
await kernelMemory.ImportTextAsync(
    text: "Our support hours are Monday-Friday 9am-5pm EST. " +
          "Emergency support available 24/7 for enterprise customers.",
    documentId: "support-hours",
    tags: new TagCollection { { "type", "policy" }, { "department", "support" } });

// Check import status (async processing)
while (!await kernelMemory.IsDocumentReadyAsync("manual-v2.1"))
{
    await Task.Delay(1000);
    Console.WriteLine("Processing document...");
}
Console.WriteLine("Document ready for queries!");
Enter fullscreen mode Exit fullscreen mode

Asking Questions with Citations

var answer = await kernelMemory.AskAsync(
    question: "What are the safety warnings for the Widget Pro?",
    filters: new MemoryFilters().ByTag("product", "widget-pro"));

Console.WriteLine($"Answer: {answer.Result}");
Console.WriteLine("\nSources:");

foreach (var citation in answer.RelevantSources)
{
    Console.WriteLine($"  📄 {citation.SourceName}");
    foreach (var partition in citation.Partitions)
    {
        Console.WriteLine($"     Page {partition.PageNumber}: \"{partition.Text[..Math.Min(100, partition.Text.Length)]}...\"");
        Console.WriteLine($"     Relevance: {partition.Relevance:P0}");
    }
}

// Output:
// Answer: The Widget Pro has the following safety warnings: 1) Do not operate near water...
//
// Sources:
//   📄 product-manual.pdf
//      Page 15: "SAFETY WARNINGS: Do not operate the Widget Pro near water or in humid..."
//      Relevance: 94%
Enter fullscreen mode Exit fullscreen mode

Filtering by Tags

// Only search enterprise documentation
var enterpriseAnswer = await kernelMemory.AskAsync(
    question: "How do I configure SSO?",
    filters: new MemoryFilters()
        .ByTag("audience", "enterprise")
        .ByTag("type", "configuration"));

// Search across specific document versions
var v2Answer = await kernelMemory.AskAsync(
    question: "What's new in this version?",
    filters: new MemoryFilters()
        .ByTag("version", "2.0")
        .ByTag("version", "2.1"));  // OR logic for same tag

// Exclude certain content
var publicAnswer = await kernelMemory.AskAsync(
    question: "What are your pricing tiers?",
    filters: new MemoryFilters()
        .ByTag("visibility", "public"));  // No internal docs
Enter fullscreen mode Exit fullscreen mode

Memory in Conversational Context

For chat applications, combine short-term (conversation) and long-term (knowledge base) memory:

public class ConversationalMemoryService
{
    private readonly ISemanticTextMemory _longTermMemory;
    private readonly IMemoryStore _shortTermStore;
    private readonly Kernel _kernel;

    public async Task<string> ProcessMessageAsync(
        string conversationId, 
        string userMessage)
    {
        // 1. Store the user message in short-term memory
        await _shortTermStore.UpsertAsync(
            collection: $"conversation-{conversationId}",
            record: MemoryRecord.LocalRecord(
                id: Guid.NewGuid().ToString(),
                text: $"User: {userMessage}",
                embedding: await GenerateEmbeddingAsync(userMessage)));

        // 2. Search long-term memory for relevant context
        var relevantKnowledge = await _longTermMemory
            .SearchAsync("knowledge-base", userMessage, limit: 3, minRelevanceScore: 0.75)
            .ToListAsync();

        // 3. Get recent conversation history
        var recentHistory = await GetRecentConversationAsync(conversationId, limit: 10);

        // 4. Build the prompt with both memory types
        var prompt = $"""
            You are a helpful assistant. Use the following context to answer the user's question.

            ## Relevant Knowledge:
            {string.Join("\n\n", relevantKnowledge.Select(r => r.Metadata.Text))}

            ## Recent Conversation:
            {string.Join("\n", recentHistory)}

            ## Current Question:
            {userMessage}

            Answer:
            """;

        var response = await _kernel.InvokePromptAsync<string>(prompt);

        // 5. Store the response in short-term memory
        await _shortTermStore.UpsertAsync(
            collection: $"conversation-{conversationId}",
            record: MemoryRecord.LocalRecord(
                id: Guid.NewGuid().ToString(),
                text: $"Assistant: {response}",
                embedding: await GenerateEmbeddingAsync(response!)));

        return response!;
    }
}
Enter fullscreen mode Exit fullscreen mode

Memory Maintenance

Keep your memory stores healthy:

public class MemoryMaintenanceService
{
    private readonly IMemoryStore _store;

    // Remove outdated memories
    public async Task PruneOldMemoriesAsync(string collection, TimeSpan maxAge)
    {
        var cutoff = DateTime.UtcNow - maxAge;
        var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();

        foreach (var record in allRecords)
        {
            var metadata = ParseMetadata(record.Metadata.AdditionalMetadata);
            if (metadata.TryGetValue("created_at", out var createdStr) &&
                DateTime.Parse(createdStr) < cutoff)
            {
                await _store.RemoveAsync(collection, record.Metadata.Id);
            }
        }
    }

    // Re-embed memories with a new model
    public async Task ReembedCollectionAsync(
        string collection,
        ITextEmbeddingGenerationService oldService,
        ITextEmbeddingGenerationService newService)
    {
        var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();

        foreach (var batch in allRecords.Chunk(100))
        {
            var texts = batch.Select(r => r.Metadata.Text).ToArray();
            var newEmbeddings = await newService.GenerateEmbeddingsAsync(texts);

            for (int i = 0; i < batch.Length; i++)
            {
                var updated = MemoryRecord.LocalRecord(
                    id: batch[i].Metadata.Id,
                    text: batch[i].Metadata.Text,
                    description: batch[i].Metadata.Description,
                    embedding: newEmbeddings[i],
                    additionalMetadata: batch[i].Metadata.AdditionalMetadata);

                await _store.UpsertAsync(collection, updated);
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

What's Next

In this article, we explored Semantic Kernel's memory capabilities:

  • Embeddings: Transforming text into searchable vectors
  • ISemanticTextMemory: The abstraction for storing and searching
  • Memory Stores: Azure AI Search, Qdrant, PostgreSQL, Redis
  • Kernel Memory: Production-grade document processing with citations
  • Conversational Memory: Combining short-term and long-term context

In Part 4, we'll put memory to work building production RAG applications—chunking strategies, retrieval patterns, context window management, and evaluation techniques.


This is Part 3 of a 5-part series on Semantic Kernel. Next up: Production RAG Patterns

Top comments (0)