Ali Suleyman TOPUZ

Posted on Mar 21 • Originally published at topuzas.Medium on Mar 21

Semantic Kernel and AI Agent Architecture: Orchestrating Enterprise LLMs in .NET 9

#semantickernel #agents #designsystems #llmorchestration

A staff engineer’s deep-dive into Microsoft’s Semantic Kernel framework for building production-grade AI agents. Learn why enterprise LLM integration fails and how orchestration frameworks solve memory, composability, and operational challenges.

Executive Summary

For Senior Software Engineers, Semantic Kernel (SK) represents a paradigm shift. It doesn’t just simplify LLM integration — it fundamentally restructures application boundaries, state management, and workflow orchestration when non-deterministic AI components become first-class citizens in our architecture. In .NET 9, this is further solidified by the Microsoft.Extensions.AI (MEAI) ecosystem, allowing for a decoupled, vendor-agnostic AI stack.

1. The Core Challenge: The Stateless Black-Box Dilemma

Traditional APIs are predictable; LLMs are probabilistic. They are Stateless , Non-deterministic , and Context-Limited. Bridging this gap requires an orchestrator to manage context, enforce schemas, and provide observability.

2. Conceptual Overview of Semantic Kernel

In the modern .NET 9 ecosystem, the architecture is split into two layers:

The Foundational Layer (Microsoft.Extensions.AI): Provides the standard interfaces like IChatClient.
The Orchestration Layer (Semantic Kernel): Uses those interfaces to manage Plugins (the hands), Planners (the brain), and Filters (the guardrails).

3. Implementation Deep Dive in .NET 9

As a “Player-Coach,” I don’t just talk architecture; I write the “Golden Path” code. Here is how we implement a production-grade Kernel setup in .NET 9

3.1 Bootstrap and Configuration

We leverage the new .NET 9 abstractions to ensure our kernel is decoupled from the specific model provider.

using Microsoft.SemanticKernel;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using Azure.AI.OpenAI;

public static class AIOrchestrationExtensions
{
    public static IServiceCollection AddEnterpriseAIServices(this IServiceCollection services, IConfiguration config)
    {
        // 1. Setup the Foundational IChatClient (standard in .NET 9)
        services.AddChatClient(builder => builder
            .UseFunctionCalling() // Enables the model to use tools
            .UseOpenTelemetry() // Native distributed tracing
            .Use(new AzureOpenAIChatClient(
                new Uri(config["AzureOpenAI:Endpoint"]!), 
                new AzureKeyCredential(config["AzureOpenAI:ApiKey"]!),
                config["AzureOpenAI:ModelId"]!)));

        // 2. Build the Semantic Kernel using the Chat Client above
        var builder = Kernel.CreateBuilder();
        builder.Services.AddSingleton(services.BuildServiceProvider().GetRequiredService<IChatClient>());

        // 3. Register Business Logic Plugins
        builder.Plugins.AddFromType<InventoryPlugin>();
        builder.Plugins.AddFromType<VendorContractPlugin>();

        services.AddTransient<Kernel>(sp => builder.Build());
        return services;
    }
}

3.2 Building a Native Plugin

Native plugins are deterministic C# methods. The [Description] attribute is crucial; it acts as the "API Documentation" for the LLM.

public class InventoryPlugin
{
    [KernelFunction]
    [Description("Returns current stock levels for a product SKU.")]
    public async Task<int> GetStockLevelAsync(string sku)
    {
        // High-perf DB or gRPC call logic here
        return await Task.FromResult(4); 
    }
}

3.3 Auto-Invocation Loop

In .NET 9, we don’t manually call tools. We let the kernel “think” and call them automatically until the goal is met.

public async Task ProcessAgentRequest(Kernel kernel, string userPrompt)
{
    // AutoInvokeKernelFunctions handles the "Reasoning Loop"
    var settings = new OpenAIPromptExecutionSettings 
    { 
        ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions 
    };

    var result = await kernel.InvokePromptAsync(userPrompt, new(settings));
    Console.WriteLine(result.ToString());
}

4. Observability and Resilience

For a Senior Architect, Observability is non-negotiable. .NET 9’s AI stack is built on ActivitySource, making OpenTelemetry integration native.

The “Guardrail” Filter

We use IFunctionInvocationFilter to intercept calls before they execute. This is where we check permissions or cost limits.

public class SafetyFilter : IFunctionInvocationFilter
{
    public async Task OnFunctionInvocationAsync(FunctionInvocationContext context, Func<FunctionInvocationContext, Task> next)
    {
        // Pre-call: Check if the SKU being requested is authorized
        if (context.Function.Name == "GetStockLevelAsync") { /* Logic */ }

        await next(context); // Execute the actual function
    }
}

5. Practical Use Case: Supply Chain Assistant

Scenario: Automating stock exceptions.

Ingestion: The agent sees low stock via InventoryPlugin.
RAG: It queries the Vector Store (Vendor Contracts) for lead times.
Action: If stock is critical, it drafts an email automatically.

public async Task HandleStockShortageAsync(Kernel kernel, string sku)
{
    string prompt = @"You are a Supply Chain Assistant. 
                      Check stock for {{$sku}}. 
                      Search contracts for lead times. 
                      If stock < 10 and lead time > 5 days, draft a restock email.";

    var result = await kernel.InvokePromptAsync(prompt, new() { ["sku"] = sku });
    _logger.LogInformation("Agent Output: {Result}", result);
}

6. Distributed State & Long-term Memory

In production, AI agents often run on stateless infrastructure (like Azure Functions or Container Apps). However, a “human-like” assistant must remember preferences from five minutes ago and technical specs from a 500-page manual. Semantic Kernel handles this through Chat History and Vector Stores.

6.1 Chat History: Managing the Conversation Context

LLMs do not “remember” sessions. We must pass the entire conversation back to them with every request. In .NET 9, we architect this using a persistent IChatHistory store.

public async Task ExecuteStatefulConversationAsync(Kernel kernel, string userId, string input)
{
    // 1. Load conversation history from a persistent store (e.g., Redis or CosmosDB)
    var history = await _sessionRepository.GetHistoryAsync(userId);

    // 2. Append the new user input
    history.AddUserMessage(input);

    // 3. Invoke the Chat Completion service with history
    var chatService = kernel.GetRequiredService<IChatCompletionService>();
    var response = await chatService.GetChatMessageContentAsync(history, kernel: kernel);

    // 4. Update the store with the assistant's response
    history.AddAssistantMessage(response.Content!);
    await _sessionRepository.SaveHistoryAsync(userId, history);
}

6.2 Vector Stores: The “Corporate Brain” (RAG)

.NET 9 introduces a standardized Vector Store abstraction. This allows the Agent to perform Retrieval Augmented Generation (RAG) by searching across vectorized corporate data.

Architectural Advantage: By using the IVectorStore interface, your code remains decoupled from the specific database provider (e.g., Azure AI Search, Pinecone, or Milvus).
Metadata Filtering: A staff-level implementation doesn’t just search for “text similarity.” It uses metadata (e.g., DepartmentId, SecurityLevel) to ensure the Agent only retrieves information the user is authorized to see.

Staff Engineer Note: Always implement Semantic Caching . Before sending a query to the LLM, check if a similar question has been answered recently in your Vector Store to save on token costs and reduce latency.

7. Operational Excellence: Token Economics and Advanced Guardrails

In production, the “cool factor” of AI fades quickly if the cloud bill spikes. As architects, we must treat LLM tokens like any other expensive resource (e.g., IOPS or egress).

7.1 Token Management and Semantic Caching

Every word sent to the LLM costs money and increases latency. To optimize this, we implement a Semantic Cache. Before the Kernel hits the LLM, it checks a Vector Store to see if a similar question was answered recently.

public async Task<string> GetOptimizedResponseAsync(Kernel kernel, string userPrompt)
{
    // 1. Search Vector Cache for a 'similar enough' previous question
    var cachedResponse = await _vectorCache.GetSimilarResultAsync(userPrompt, threshold: 0.95);
    if (cachedResponse != null) return cachedResponse;

    // 2. If no cache hit, proceed to LLM
    var result = await kernel.InvokePromptAsync(userPrompt);

    // 3. Update cache for future hits
    await _vectorCache.StoreResultAsync(userPrompt, result.ToString());

    return result.ToString();
}

7.2 Advanced Guardrails: The “Planner Validator”

In Section 4, we discussed simple filters. For high-stakes environments, we need a Plan Validation Step. If an agent generates a plan to “Delete User Account,” a deterministic layer must intercept it.

public class PlanValidationFilter : IAutoFunctionInvocationFilter
{
    public async Task OnAutoFunctionInvocationAsync(AutoFunctionInvocationContext context, Func<AutoFunctionInvocationContext, Task> next)
    {
        // Staff-level check: Is the agent trying to call a restricted tool?
        if (context.Function.Name.Contains("Delete", StringComparison.OrdinalIgnoreCase))
        {
            if (!IsUserAdmin(context.Kernel.Arguments["userId"]?.ToString()))
            {
                throw new UnauthorizedAccessException("Agent attempted unauthorized destructive action.");
            }
        }

        await next(context);
    }
}

7.3 Governance: Rate Limiting and Circuit Breakers

LLM APIs can be flaky or slow. By wrapping our IChatClient (configured in Section 3.1) with standard Polly policies, we ensure our .NET 9 application doesn't hang when OpenAI/Azure is under load.

Retry Pattern: For 429 Too Many Requests.
Circuit Breaker: To stop calling a degraded model and fallback to a smaller, cheaper one (e.g., GPT-4o to GPT-4o-mini).

8. Testing and Evaluation (The “Staff” Reality Check)

Unlike traditional unit tests, AI testing is probabilistic. We use LLM-assisted Evaluation (LLM-as-a-judge).

Input: User Question + Agent Answer.
Validator: A separate, highly-capable model (like GPT-4o) evaluates the answer based on a rubric (Accuracy, Tone, Grounding).
Result: A numeric score for CI/CD pipelines.

Conclusion

Semantic Kernel in .NET 9 is more than a library; it is the implementation of a Reliable AI Distributed System. By combining the new IChatClient abstractions, IVectorStore memory, and IFunctionInvocationFilters, we move AI from a "chat box" to a mission-critical enterprise asset.

DEV Community