LLMs have a fundamental limitation: they're stateless. Every request starts fresh with no memory of previous conversations or your organization's knowledge. This is where Semantic Kernel's memory system comes in—transforming raw text into searchable vector embeddings that give your AI persistent, semantic understanding.
In Part 2, we explored plugins. Now we'll dive deep into the memory layer that powers intelligent retrieval.
Why Memory Matters
Consider a customer support bot. Without memory, it can't:
- Remember what the customer said 5 messages ago
- Access your product documentation
- Know your company's policies
- Learn from resolved tickets
With Semantic Kernel memory, you transform unstructured text into vector embeddings—numerical representations that capture semantic meaning. Similar concepts cluster together in vector space, enabling semantic search that understands intent, not just keywords.
Understanding Embeddings
Before diving into code, let's understand what's happening under the hood.
When you send text to an embedding model, it returns a high-dimensional vector (typically 1536 or 3072 dimensions). These vectors have a remarkable property: semantically similar texts produce similar vectors.
var embeddingService = kernel.GetRequiredService<ITextEmbeddingGenerationService>();
var texts = new[]
{
"The cat sat on the mat",
"A feline rested on the rug",
"The stock market crashed today",
"Dogs are loyal companions"
};
var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts);
// Calculate cosine similarity between vectors
float CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
{
var spanA = a.Span;
var spanB = b.Span;
float dot = 0, normA = 0, normB = 0;
for (int i = 0; i < spanA.Length; i++)
{
dot += spanA[i] * spanB[i];
normA += spanA[i] * spanA[i];
normB += spanB[i] * spanB[i];
}
return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
}
// "cat on mat" vs "feline on rug" → ~0.92 (very similar!)
// "cat on mat" vs "stock market" → ~0.31 (unrelated)
Console.WriteLine($"Cat/Feline similarity: {CosineSimilarity(embeddings[0], embeddings[1]):F2}");
Console.WriteLine($"Cat/Market similarity: {CosineSimilarity(embeddings[0], embeddings[2]):F2}");
Setting Up Memory with ISemanticTextMemory
The ISemanticTextMemory interface provides a simple abstraction for storing and searching memories:
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;
// Build the memory system
var memoryBuilder = new MemoryBuilder();
// Add embedding generation
memoryBuilder.WithAzureOpenAITextEmbeddingGeneration(
deploymentName: "text-embedding-3-large",
endpoint: config["AzureOpenAI:Endpoint"]!,
apiKey: config["AzureOpenAI:Key"]!);
// Add a memory store (we'll explore options below)
memoryBuilder.WithMemoryStore(new VolatileMemoryStore());
var memory = memoryBuilder.Build();
Storing Memories
// Store individual memories
await memory.SaveInformationAsync(
collection: "company-policies",
id: "refund-policy",
text: "Customers may return any item within 30 days of purchase for a full refund. " +
"Items must be unused and in original packaging. Digital products are non-refundable.",
description: "Company refund and return policy",
additionalMetadata: "department=customer-service,version=2024-01,priority=high");
await memory.SaveInformationAsync(
collection: "company-policies",
id: "shipping-policy",
text: "Free shipping on orders over $50. Standard shipping takes 5-7 business days. " +
"Express shipping (2-3 days) available for $9.99. Overnight shipping $24.99.",
description: "Shipping rates and timeframes",
additionalMetadata: "department=logistics,version=2024-01");
await memory.SaveInformationAsync(
collection: "company-policies",
id: "warranty-policy",
text: "All electronics come with a 1-year manufacturer warranty. Extended warranties " +
"available for purchase. Warranty covers defects, not accidental damage.",
description: "Product warranty information",
additionalMetadata: "department=support,version=2024-01");
Semantic Search
Now the magic—semantic search that understands meaning:
// Search for relevant information
var searchResults = memory.SearchAsync(
collection: "company-policies",
query: "How long do I have to return something?", // Note: doesn't contain "refund"
limit: 3,
minRelevanceScore: 0.7);
await foreach (var result in searchResults)
{
Console.WriteLine($"[{result.Relevance:P0}] {result.Metadata.Description}");
Console.WriteLine($" {result.Metadata.Text}");
Console.WriteLine($" Metadata: {result.Metadata.AdditionalMetadata}");
Console.WriteLine();
}
// Output:
// [94%] Company refund and return policy
// Customers may return any item within 30 days of purchase...
// Metadata: department=customer-service,version=2024-01,priority=high
Memory Store Implementations
Semantic Kernel supports multiple vector stores. Choose based on your requirements:
VolatileMemoryStore (Development)
In-memory storage—fast but ephemeral:
var store = new VolatileMemoryStore();
memoryBuilder.WithMemoryStore(store);
// Great for testing and prototyping
// Data lost when process ends
Azure AI Search (Production Recommended)
Enterprise-grade with hybrid search (vector + keyword):
using Microsoft.SemanticKernel.Connectors.AzureAISearch;
var store = new AzureAISearchMemoryStore(
endpoint: config["AzureSearch:Endpoint"]!,
apiKey: config["AzureSearch:Key"]!);
memoryBuilder.WithMemoryStore(store);
// Features:
// - Hybrid search (vector + BM25 keyword)
// - Semantic ranking
// - Faceted filtering
// - Geo-spatial queries
// - Enterprise security (RBAC, private endpoints)
Qdrant (Self-Hosted Performance)
High-performance open-source vector database:
using Microsoft.SemanticKernel.Connectors.Qdrant;
var store = new QdrantMemoryStore(
host: "localhost",
port: 6333,
vectorSize: 3072); // Match your embedding model
memoryBuilder.WithMemoryStore(store);
// Features:
// - Excellent filtering capabilities
// - Horizontal scaling
// - Snapshot backups
// - gRPC for performance
PostgreSQL with pgvector
Use your existing Postgres infrastructure:
using Microsoft.SemanticKernel.Connectors.Postgres;
using Npgsql;
var dataSource = NpgsqlDataSource.Create(config.GetConnectionString("Postgres")!);
var store = new PostgresMemoryStore(dataSource, vectorSize: 3072);
memoryBuilder.WithMemoryStore(store);
// Features:
// - Familiar SQL ecosystem
// - ACID transactions
// - Complex queries combining vectors with relational data
// - Cost-effective for smaller datasets
Redis (Low-Latency Cache)
When milliseconds matter:
using Microsoft.SemanticKernel.Connectors.Redis;
using StackExchange.Redis;
var redis = ConnectionMultiplexer.Connect(config["Redis:Connection"]!);
var store = new RedisMemoryStore(redis.GetDatabase(), vectorSize: 3072);
memoryBuilder.WithMemoryStore(store);
// Features:
// - Sub-millisecond latency
// - Automatic expiration (TTL)
// - Cluster support
// - Good for session/conversation memory
Comparison Matrix
| Store | Best For | Latency | Scaling | Managed | Hybrid Search |
|---|---|---|---|---|---|
| Azure AI Search | Production enterprise | Medium | Excellent | Yes | Yes |
| Qdrant | Self-hosted, filtering | Low | Good | Optional | No |
| PostgreSQL | Existing Postgres infra | Medium | Good | Optional | Via extensions |
| Redis | Low-latency, ephemeral | Very Low | Good | Optional | No |
| Volatile | Development/testing | Instant | N/A | N/A | No |
Memory Records Deep Dive
Each memory record contains:
public class MemoryRecord
{
// Core data
public string Id { get; } // Unique identifier
public string Text { get; } // Original text content
public ReadOnlyMemory<float> Embedding { get } // Vector representation
// Metadata
public MemoryRecordMetadata Metadata { get; }
}
public class MemoryRecordMetadata
{
public string Id { get; }
public string Text { get; }
public string Description { get; } // Human-readable description
public string AdditionalMetadata { get; } // Custom key=value pairs
public string ExternalSourceName { get; } // Source system reference
public bool IsReference { get; } // Is this a reference to external data?
}
Working with Metadata
Use metadata for filtering and organization:
// Store with rich metadata
await memory.SaveInformationAsync(
collection: "support-tickets",
id: $"ticket-{ticket.Id}",
text: $"Issue: {ticket.Title}\n\nDescription: {ticket.Description}\n\nResolution: {ticket.Resolution}",
description: $"Resolved ticket: {ticket.Title}",
additionalMetadata: $"category={ticket.Category},priority={ticket.Priority}," +
$"resolved_date={ticket.ResolvedAt:yyyy-MM-dd},agent={ticket.AgentId}");
// Later, search and filter
var results = memory.SearchAsync("customer can't login", limit: 10, minRelevanceScore: 0.75);
await foreach (var result in results)
{
// Parse metadata for filtering
var metadata = result.Metadata.AdditionalMetadata
.Split(',')
.Select(p => p.Split('='))
.ToDictionary(p => p[0], p => p[1]);
if (metadata["category"] == "authentication")
{
Console.WriteLine($"Relevant auth ticket: {result.Metadata.Description}");
}
}
Kernel Memory: Production RAG
For production RAG pipelines, Microsoft.KernelMemory provides a more robust solution:
using Microsoft.KernelMemory;
var kernelMemory = new KernelMemoryBuilder()
// LLM for summarization and answer generation
.WithAzureOpenAITextGeneration(new AzureOpenAIConfig
{
Deployment = "gpt-4o",
Endpoint = config["AzureOpenAI:Endpoint"]!,
APIKey = config["AzureOpenAI:Key"]!,
APIType = AzureOpenAIConfig.APITypes.ChatCompletion
})
// Embedding model
.WithAzureOpenAITextEmbeddingGeneration(new AzureOpenAIConfig
{
Deployment = "text-embedding-3-large",
Endpoint = config["AzureOpenAI:Endpoint"]!,
APIKey = config["AzureOpenAI:Key"]!,
APIType = AzureOpenAIConfig.APITypes.EmbeddingGeneration
})
// Vector storage
.WithAzureAISearchMemoryDb(new AzureAISearchConfig
{
Endpoint = config["AzureSearch:Endpoint"]!,
APIKey = config["AzureSearch:Key"]!
})
.Build<MemoryServerless>();
Importing Documents
Kernel Memory handles document processing automatically:
// Import a PDF
await kernelMemory.ImportDocumentAsync(
filePath: "docs/product-manual.pdf",
documentId: "manual-v2.1",
tags: new TagCollection
{
{ "product", "widget-pro" },
{ "version", "2.1" },
{ "type", "manual" }
});
// Import a web page
await kernelMemory.ImportWebPageAsync(
url: "https://docs.company.com/api-reference",
documentId: "api-docs",
tags: new TagCollection { { "type", "api-documentation" } });
// Import text directly
await kernelMemory.ImportTextAsync(
text: "Our support hours are Monday-Friday 9am-5pm EST. " +
"Emergency support available 24/7 for enterprise customers.",
documentId: "support-hours",
tags: new TagCollection { { "type", "policy" }, { "department", "support" } });
// Check import status (async processing)
while (!await kernelMemory.IsDocumentReadyAsync("manual-v2.1"))
{
await Task.Delay(1000);
Console.WriteLine("Processing document...");
}
Console.WriteLine("Document ready for queries!");
Asking Questions with Citations
var answer = await kernelMemory.AskAsync(
question: "What are the safety warnings for the Widget Pro?",
filters: new MemoryFilters().ByTag("product", "widget-pro"));
Console.WriteLine($"Answer: {answer.Result}");
Console.WriteLine("\nSources:");
foreach (var citation in answer.RelevantSources)
{
Console.WriteLine($" 📄 {citation.SourceName}");
foreach (var partition in citation.Partitions)
{
Console.WriteLine($" Page {partition.PageNumber}: \"{partition.Text[..Math.Min(100, partition.Text.Length)]}...\"");
Console.WriteLine($" Relevance: {partition.Relevance:P0}");
}
}
// Output:
// Answer: The Widget Pro has the following safety warnings: 1) Do not operate near water...
//
// Sources:
// 📄 product-manual.pdf
// Page 15: "SAFETY WARNINGS: Do not operate the Widget Pro near water or in humid..."
// Relevance: 94%
Filtering by Tags
// Only search enterprise documentation
var enterpriseAnswer = await kernelMemory.AskAsync(
question: "How do I configure SSO?",
filters: new MemoryFilters()
.ByTag("audience", "enterprise")
.ByTag("type", "configuration"));
// Search across specific document versions
var v2Answer = await kernelMemory.AskAsync(
question: "What's new in this version?",
filters: new MemoryFilters()
.ByTag("version", "2.0")
.ByTag("version", "2.1")); // OR logic for same tag
// Exclude certain content
var publicAnswer = await kernelMemory.AskAsync(
question: "What are your pricing tiers?",
filters: new MemoryFilters()
.ByTag("visibility", "public")); // No internal docs
Memory in Conversational Context
For chat applications, combine short-term (conversation) and long-term (knowledge base) memory:
public class ConversationalMemoryService
{
private readonly ISemanticTextMemory _longTermMemory;
private readonly IMemoryStore _shortTermStore;
private readonly Kernel _kernel;
public async Task<string> ProcessMessageAsync(
string conversationId,
string userMessage)
{
// 1. Store the user message in short-term memory
await _shortTermStore.UpsertAsync(
collection: $"conversation-{conversationId}",
record: MemoryRecord.LocalRecord(
id: Guid.NewGuid().ToString(),
text: $"User: {userMessage}",
embedding: await GenerateEmbeddingAsync(userMessage)));
// 2. Search long-term memory for relevant context
var relevantKnowledge = await _longTermMemory
.SearchAsync("knowledge-base", userMessage, limit: 3, minRelevanceScore: 0.75)
.ToListAsync();
// 3. Get recent conversation history
var recentHistory = await GetRecentConversationAsync(conversationId, limit: 10);
// 4. Build the prompt with both memory types
var prompt = $"""
You are a helpful assistant. Use the following context to answer the user's question.
## Relevant Knowledge:
{string.Join("\n\n", relevantKnowledge.Select(r => r.Metadata.Text))}
## Recent Conversation:
{string.Join("\n", recentHistory)}
## Current Question:
{userMessage}
Answer:
""";
var response = await _kernel.InvokePromptAsync<string>(prompt);
// 5. Store the response in short-term memory
await _shortTermStore.UpsertAsync(
collection: $"conversation-{conversationId}",
record: MemoryRecord.LocalRecord(
id: Guid.NewGuid().ToString(),
text: $"Assistant: {response}",
embedding: await GenerateEmbeddingAsync(response!)));
return response!;
}
}
Memory Maintenance
Keep your memory stores healthy:
public class MemoryMaintenanceService
{
private readonly IMemoryStore _store;
// Remove outdated memories
public async Task PruneOldMemoriesAsync(string collection, TimeSpan maxAge)
{
var cutoff = DateTime.UtcNow - maxAge;
var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();
foreach (var record in allRecords)
{
var metadata = ParseMetadata(record.Metadata.AdditionalMetadata);
if (metadata.TryGetValue("created_at", out var createdStr) &&
DateTime.Parse(createdStr) < cutoff)
{
await _store.RemoveAsync(collection, record.Metadata.Id);
}
}
}
// Re-embed memories with a new model
public async Task ReembedCollectionAsync(
string collection,
ITextEmbeddingGenerationService oldService,
ITextEmbeddingGenerationService newService)
{
var allRecords = await _store.GetBatchAsync(collection, limit: int.MaxValue).ToListAsync();
foreach (var batch in allRecords.Chunk(100))
{
var texts = batch.Select(r => r.Metadata.Text).ToArray();
var newEmbeddings = await newService.GenerateEmbeddingsAsync(texts);
for (int i = 0; i < batch.Length; i++)
{
var updated = MemoryRecord.LocalRecord(
id: batch[i].Metadata.Id,
text: batch[i].Metadata.Text,
description: batch[i].Metadata.Description,
embedding: newEmbeddings[i],
additionalMetadata: batch[i].Metadata.AdditionalMetadata);
await _store.UpsertAsync(collection, updated);
}
}
}
}
What's Next
In this article, we explored Semantic Kernel's memory capabilities:
- Embeddings: Transforming text into searchable vectors
- ISemanticTextMemory: The abstraction for storing and searching
- Memory Stores: Azure AI Search, Qdrant, PostgreSQL, Redis
- Kernel Memory: Production-grade document processing with citations
- Conversational Memory: Combining short-term and long-term context
In Part 4, we'll put memory to work building production RAG applications—chunking strategies, retrieval patterns, context window management, and evaluation techniques.
This is Part 3 of a 5-part series on Semantic Kernel. Next up: Production RAG Patterns
Top comments (0)