DEV Community

Cover image for Modernizing .NET Architectures for AI-Native Workloads
Elvin Suleymanov
Elvin Suleymanov

Posted on

Modernizing .NET Architectures for AI-Native Workloads

For the past decade, .NET architects have been perfecting a craft. Clean separation of concerns. Domain-driven design. Event-driven microservices. CQRS. Hexagonal architecture. The patterns are mature, battle-tested, and well-understood. Teams have learned how to build systems that are reliable, maintainable, and scalable.

Then AI arrived - not as a feature, but as an expectation.

Not the AI of a sentiment analysis endpoint bolted onto the side of an API. Not a classification model embedded in a background job. The new expectation is AI that reasons across your entire domain, remembers context across sessions, orchestrates multi-step workflows autonomously, and integrates with every surface of your product simultaneously.

This is what it means to be AI-native. And it does not fit cleanly into the architectures most .NET teams built over the last decade.

This article is a practical guide to bridging that gap. We will examine how to evolve a modern .NET architecture to support AI-native workloads - without discarding everything you have built and without compromising the engineering discipline that makes .NET systems trustworthy.

Who is this for? Senior .NET engineers, architects, and tech leads who are integrating AI deeply into production systems and need a structural framework for doing it correctly.


Table of Contents

  1. What AI-Native Actually Means
  2. Why Traditional .NET Architectures Struggle with AI
  3. The AI-Native Architecture Stack
  4. Semantic Kernel as the AI Orchestration Layer
  5. Designing AI-Aware Domain Models
  6. Vector Storage and Semantic Memory in .NET
  7. The Agent Pattern: Autonomous AI in Your Domain
  8. Streaming AI Responses with ASP.NET Core
  9. Observability for AI Workloads
  10. Guardrails: Safety, Cost Control, and Responsible AI
  11. Real-World Walkthrough: AI-Native CRM Feature
  12. Migration Path: Evolving an Existing .NET System

1. What AI-Native Actually Means

The term AI-native is used loosely in the industry, so let us define it precisely for the context of .NET architecture.

An AI-native system is one in which the AI capability is structurally integrated into the architecture - not added as a feature on top of it. The difference matters enormously in practice.

Consider two approaches to adding a "smart contract summarization" feature to a legal SaaS product.

The first approach creates a new API endpoint, calls an LLM from inside a service method, returns the result. The AI is a black box embedded in a single method in a single service. It has no awareness of the domain model, no memory of previous interactions, no ability to take follow-up actions, and no path for the engineering team to reason about what the AI actually did or why.

The second approach models the AI capability as a first-class architectural concern. The AI orchestrator knows the domain. It has access to the company's private knowledge base via semantic search. It can take actions in the system - drafting a document, flagging a clause, notifying a reviewer - via well-defined tool interfaces. Every AI interaction is observable, traceable, and auditable. The LLM provider is an infrastructure dependency, not a hardcoded implementation detail.

The second approach is AI-native. It is also significantly harder to build. This article gives you the framework to do it correctly.


2. Why Traditional .NET Architectures Struggle with AI

The Clean Architecture and Domain-Driven Design patterns that .NET teams have embraced are fundamentally synchronous, deterministic, and stateless between requests. A command handler receives input, applies business rules, persists state, and returns. The output is predictable given the input. The execution time is bounded and measurable.

AI workloads break every one of these assumptions.

LLM calls are slow. A single inference call to GPT-4o or Claude 3.7 takes anywhere from 500ms to 30 seconds depending on the prompt length and output complexity. An application command handler that takes 15 seconds to complete will time out, exhaust thread pool resources, and generate a torrent of support tickets.

LLM outputs are non-deterministic. Two identical prompts with the same model can produce subtly different outputs. Unit tests that assert on exact string output will fail randomly. Deterministic business logic cannot assume deterministic AI output.

AI workflows are long-running and stateful. An agent that researches a topic, drafts a document, asks the user for clarification, incorporates the feedback, and then publishes the result is not a request-response operation. It is a workflow that spans multiple turns, potentially across multiple sessions. The request-scoped dependency injection lifetime and the stateless HTTP handler model do not accommodate this naturally.

AI introduces external costs per call. Every LLM call costs money in API tokens. Traditional architectures have no concept of per-call economic cost - there is no mechanism to budget, throttle, or optimize LLM usage without building it explicitly.

The context window is a shared resource. An LLM call is not just "pass input, get output." The entire conversation history, retrieved documents, tool definitions, and system prompt all compete for space inside a fixed context window. Managing this window is a first-class engineering problem that has no analog in traditional CRUD architecture.

These are not minor inconveniences. They are fundamental mismatches between the assumptions embedded in traditional .NET architecture and the requirements of AI-native workloads. Addressing them requires structural changes, not just new NuGet packages.


3. The AI-Native Architecture Stack

Before writing any code, it helps to visualize how an AI-native .NET system is layered. The following stack extends the familiar Clean Architecture layers with the AI-specific concerns that sit between the application layer and the external AI infrastructure.

┌──────────────────────────────────────────────────────────────────┐
│                        Presentation Layer                        │
│          ASP.NET Core Minimal APIs   Blazor   gRPC               │
│          Streaming responses    WebSockets    SignalR             │
├──────────────────────────────────────────────────────────────────┤
│                      AI Orchestration Layer          ← NEW       │
│          Semantic Kernel Kernel      Agent runtime               │
│          Planner    Memory manager   Plugin registry             │
│          Prompt template engine      Token budget manager        │
├──────────────────────────────────────────────────────────────────┤
│                       Application Layer                          │
│          Command / Query handlers    Domain services             │
│          AI Tool implementations    Workflow coordinators        │
├──────────────────────────────────────────────────────────────────┤
│                        Domain Layer                              │
│          Entities    Value objects    Domain events              │
│          AI-aware aggregates         Semantic metadata           │
├──────────────────────────────────────────────────────────────────┤
│                     Infrastructure Layer                         │
│          PostgreSQL    Redis    Blob storage                      │
│          Vector DB (pgvector / Qdrant / Azure AI Search)         │
│          LLM providers (OpenAI / Azure OpenAI / Anthropic)       │
│          Embedding providers    Observability (OTEL)             │
└──────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The AI Orchestration Layer is the critical addition. It is not part of the domain - it does not contain business rules. It is not part of the infrastructure - it is not a database driver or an HTTP client. It is the translation layer between your domain logic and the probabilistic, non-deterministic, token-consuming world of large language models. Keeping it as a distinct layer is what makes the rest of the architecture testable, maintainable, and provider-agnostic.


4. Semantic Kernel as the AI Orchestration Layer

Microsoft's Semantic Kernel is the .NET ecosystem's answer to the AI orchestration problem. It provides the abstractions that allow you to build the AI Orchestration Layer without coupling your application to a specific LLM provider, embedding model, or vector store.

The core concept in Semantic Kernel is the Kernel - a dependency injection container for AI services and plugins. Everything flows through it.

// Program.cs - register Semantic Kernel with all AI services
builder.Services.AddKernel()
    .AddAzureOpenAIChatCompletion(
        deploymentName: config["AzureOpenAI:DeploymentName"]!,
        endpoint: config["AzureOpenAI:Endpoint"]!,
        apiKey: config["AzureOpenAI:ApiKey"]!)
    .AddAzureOpenAITextEmbeddingGeneration(
        deploymentName: config["AzureOpenAI:EmbeddingDeployment"]!,
        endpoint: config["AzureOpenAI:Endpoint"]!,
        apiKey: config["AzureOpenAI:ApiKey"]!)
    .Plugins
        .AddFromType<CustomerPlugin>()
        .AddFromType<ContractPlugin>()
        .AddFromType<NotificationPlugin>();

// Register vector memory store
builder.Services.AddSingleton<IVectorStore>(sp =>
    new QdrantVectorStore(new QdrantClient("localhost")));
Enter fullscreen mode Exit fullscreen mode

Plugins as domain capability exposure

The Plugin system is how you expose your domain to the AI. A plugin is a C# class whose public methods are decorated with [KernelFunction] and natural-language descriptions. The AI reads these descriptions and decides which functions to call and in what order based on the user's intent.

public sealed class CustomerPlugin
{
    private readonly ICustomerRepository _customers;
    private readonly IOrderRepository    _orders;

    public CustomerPlugin(
        ICustomerRepository customers,
        IOrderRepository orders)
    {
        _customers = customers;
        _orders    = orders;
    }

    [KernelFunction]
    [Description("Retrieve a customer profile including contact details, " +
                 "subscription plan, and account status.")]
    public async Task<CustomerDto> GetCustomerProfileAsync(
        [Description("The unique customer identifier (UUID)")]
        string customerId,
        CancellationToken cancellationToken = default)
    {
        var customer = await _customers.GetByIdAsync(
            Guid.Parse(customerId), cancellationToken);

        return customer is null
            ? throw new CustomerNotFoundException(customerId)
            : CustomerDto.FromDomain(customer);
    }

    [KernelFunction]
    [Description("List the most recent orders for a customer, " +
                 "sorted newest first. Returns order status, total, and items.")]
    public async Task<IReadOnlyList<OrderSummaryDto>> GetRecentOrdersAsync(
        [Description("The unique customer identifier (UUID)")]
        string customerId,
        [Description("Maximum number of orders to return. Default is 10.")]
        int limit = 10,
        CancellationToken cancellationToken = default)
    {
        var orders = await _orders.GetRecentByCustomerAsync(
            Guid.Parse(customerId), limit, cancellationToken);

        return orders.Select(OrderSummaryDto.FromDomain).ToList();
    }

    [KernelFunction]
    [Description("Update the subscription plan for a customer. " +
                 "Valid plans are: starter, professional, enterprise.")]
    public async Task<string> UpdateSubscriptionPlanAsync(
        [Description("The unique customer identifier (UUID)")]
        string customerId,
        [Description("The new subscription plan name")]
        string planName,
        CancellationToken cancellationToken = default)
    {
        await _customers.UpdatePlanAsync(
            Guid.Parse(customerId), planName, cancellationToken);

        return $"Successfully updated customer {customerId} to {planName} plan.";
    }
}
Enter fullscreen mode Exit fullscreen mode

The descriptions you write on [KernelFunction] and each parameter are not comments - they are the interface between your code and the language model. The quality of your function descriptions directly determines the reliability of the AI's decisions about when and how to call them. This is description engineering, and it is one of the most underappreciated skills in AI-native .NET development.


5. Designing AI-Aware Domain Models

A traditional domain entity is designed to support business rules enforced by application code. An AI-aware domain entity also needs to support semantic understanding - the ability for an AI to reason about what the entity means, not just what fields it has.

This is a subtle but important distinction. Consider a Contract entity in a legal SaaS:

// Traditional domain entity
public sealed class Contract
{
    public Guid   Id            { get; private set; }
    public string Title         { get; private set; }
    public string FullText      { get; private set; }
    public ContractStatus Status { get; private set; }
    public DateTimeOffset EffectiveDate { get; private set; }
    public DateTimeOffset ExpiryDate    { get; private set; }

    // Business rules enforced in domain
    public void Approve(UserId approver)
    {
        if (Status != ContractStatus.PendingReview)
            throw new DomainException("Only contracts pending review can be approved.");

        Status = ContractStatus.Approved;
        AddDomainEvent(new ContractApprovedEvent(Id, approver));
    }
}

// AI-aware domain entity adds semantic metadata
public sealed class Contract
{
    public Guid   Id            { get; private set; }
    public string Title         { get; private set; }
    public string FullText      { get; private set; }
    public ContractStatus Status { get; private set; }
    public DateTimeOffset EffectiveDate { get; private set; }
    public DateTimeOffset ExpiryDate    { get; private set; }

    // Semantic metadata - generated asynchronously after entity creation
    public string?          AiSummary       { get; private set; }
    public IReadOnlyList<string> KeyClauses { get; private set; } = [];
    public IReadOnlyList<string> RiskFlags  { get; private set; } = [];
    public float[]?         EmbeddingVector { get; private set; }
    public DateTimeOffset?  LastIndexedAt   { get; private set; }

    public void ApplySemanticAnalysis(
        string summary,
        IReadOnlyList<string> clauses,
        IReadOnlyList<string> risks,
        float[] embedding)
    {
        AiSummary       = summary;
        KeyClauses      = clauses;
        RiskFlags       = risks;
        EmbeddingVector = embedding;
        LastIndexedAt   = DateTimeOffset.UtcNow;

        AddDomainEvent(new ContractIndexedEvent(Id));
    }

    // Business rules unchanged - domain integrity is not AI's responsibility
    public void Approve(UserId approver)
    {
        if (Status != ContractStatus.PendingReview)
            throw new DomainException("Only contracts pending review can be approved.");

        Status = ContractStatus.Approved;
        AddDomainEvent(new ContractApprovedEvent(Id, approver));
    }
}
Enter fullscreen mode Exit fullscreen mode

The semantic metadata lives in the domain entity, but it is never set by the AI directly. The domain event pipeline triggers an asynchronous background job that calls the AI, generates the semantic analysis, and then calls ApplySemanticAnalysis through a proper command. The domain's integrity is preserved. The AI capability is additive, not structural.


6. Vector Storage and Semantic Memory in .NET

Vector storage is the persistence layer of AI-native systems. It stores embedding vectors alongside their source content and metadata, enabling semantic search - finding documents that are meaningfully similar to a query rather than just lexically matching keywords.

In a .NET system, the vector store is infrastructure. It belongs in the Infrastructure layer and is accessed through an interface defined in the Domain or Application layer.

// Application layer interface - no vector store dependency
public interface IContractSemanticSearch
{
    Task<IReadOnlyList<ContractSearchResult>> SearchAsync(
        string query,
        int maxResults = 10,
        float minSimilarity = 0.7f,
        CancellationToken ct = default);

    Task IndexContractAsync(
        Guid contractId,
        string content,
        ContractMetadata metadata,
        CancellationToken ct = default);
}

// Infrastructure implementation - Qdrant vector store
public sealed class QdrantContractSemanticSearch : IContractSemanticSearch
{
    private const string CollectionName = "contracts";

    private readonly QdrantClient                   _qdrant;
    private readonly ITextEmbeddingGenerationService _embeddings;

    public QdrantContractSemanticSearch(
        QdrantClient qdrant,
        ITextEmbeddingGenerationService embeddings)
    {
        _qdrant     = qdrant;
        _embeddings = embeddings;
    }

    public async Task<IReadOnlyList<ContractSearchResult>> SearchAsync(
        string query,
        int maxResults = 10,
        float minSimilarity = 0.7f,
        CancellationToken ct = default)
    {
        // Generate embedding for the search query
        var queryEmbedding = await _embeddings.GenerateEmbeddingAsync(query, ct);

        // Vector similarity search in Qdrant
        var results = await _qdrant.SearchAsync(
            collectionName: CollectionName,
            vector: queryEmbedding.ToArray(),
            limit: (ulong)maxResults,
            scoreThreshold: minSimilarity,
            cancellationToken: ct);

        return results
            .Select(r => new ContractSearchResult(
                ContractId: Guid.Parse(r.Payload["contract_id"].StringValue),
                Title:       r.Payload["title"].StringValue,
                Summary:     r.Payload["summary"].StringValue,
                Similarity:  r.Score))
            .ToList();
    }

    public async Task IndexContractAsync(
        Guid contractId,
        string content,
        ContractMetadata metadata,
        CancellationToken ct = default)
    {
        // Chunk large contracts into overlapping segments
        var chunks = ChunkText(content, chunkSize: 512, overlap: 64);

        var points = new List<PointStruct>();

        foreach (var (chunk, index) in chunks.Select((c, i) => (c, i)))
        {
            var embedding = await _embeddings.GenerateEmbeddingAsync(chunk, ct);

            points.Add(new PointStruct
            {
                Id     = new PointId { Uuid = Guid.NewGuid().ToString() },
                Vectors = new Vectors { Vector = new Vector { Data = { embedding } } },
                Payload =
                {
                    ["contract_id"]  = contractId.ToString(),
                    ["title"]        = metadata.Title,
                    ["summary"]      = metadata.AiSummary ?? "",
                    ["chunk_index"]  = index,
                    ["effective_date"] = metadata.EffectiveDate.ToString("O"),
                }
            });
        }

        await _qdrant.UpsertAsync(CollectionName, points, cancellationToken: ct);
    }

    private static IReadOnlyList<string> ChunkText(
        string text, int chunkSize, int overlap)
    {
        var words  = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
        var chunks = new List<string>();

        for (int i = 0; i < words.Length; i += chunkSize - overlap)
        {
            var chunk = string.Join(' ', words.Skip(i).Take(chunkSize));
            if (!string.IsNullOrWhiteSpace(chunk))
                chunks.Add(chunk);

            if (i + chunkSize >= words.Length)
                break;
        }

        return chunks;
    }
}
Enter fullscreen mode Exit fullscreen mode

Choosing your vector store

pgvector is the pragmatic choice for teams already running PostgreSQL. It adds vector similarity search as a PostgreSQL extension, keeping your operational footprint minimal. It handles tens of millions of vectors efficiently and supports HNSW indexing for fast approximate nearest-neighbor search.

Qdrant is purpose-built for vector search at scale. It supports filtering on payload metadata alongside vector similarity, which is essential for multi-tenant systems where you need to restrict search results by tenant before computing similarity.

Azure AI Search is the managed option for teams on Azure. It combines traditional keyword search with vector search in a single index, handles chunking and embedding generation natively, and integrates directly with Azure OpenAI.


7. The Agent Pattern: Autonomous AI in Your Domain

An agent is an AI component that can reason about a goal, decide what tools to use, call those tools in sequence, observe the results, adjust its plan, and iterate until the goal is achieved or it determines it cannot proceed further.

In .NET terms, an agent is a Semantic Kernel feature that combines a chat model with a set of plugins and a planning strategy. The agent pattern is powerful and genuinely dangerous if implemented naively - an autonomous component that can call your domain methods without human approval is a significant risk surface.

// Define an agent scoped to a specific workflow
public sealed class ContractReviewAgent
{
    private readonly Kernel              _kernel;
    private readonly IAgentAuditLogger  _audit;
    private readonly ITokenBudgetService _budget;

    public ContractReviewAgent(
        Kernel kernel,
        IAgentAuditLogger audit,
        ITokenBudgetService budget)
    {
        _kernel = kernel;
        _audit  = audit;
        _budget = budget;
    }

    public async IAsyncEnumerable<AgentUpdate> ReviewContractAsync(
        Guid contractId,
        string userInstruction,
        AgentRunContext context,
        [EnumeratorCancellation] CancellationToken ct = default)
    {
        // Enforce token budget before starting
        var remainingBudget = await _budget.GetRemainingAsync(context.TenantId, ct);
        if (remainingBudget < 5_000)
        {
            yield return AgentUpdate.Error("Insufficient token budget for this operation.");
            yield break;
        }

        // Build agent with only the plugins relevant to contract review
        // Principle of least privilege: do not give the agent tools it does not need
        var agent = new ChatCompletionAgent
        {
            Name = "ContractReviewAgent",
            Kernel = _kernel.Clone(),
            Instructions = """
                You are a contract review assistant. Your job is to analyze the
                specified contract and provide a structured review covering:

                1. A plain-English summary of what the contract covers.
                2. Key obligations for each party.
                3. Any unusual or high-risk clauses that should be flagged for
                   human legal review.
                4. Recommended questions the reviewer should clarify before signing.

                Always retrieve the full contract text before writing your review.
                Do not make assumptions about contract content you have not retrieved.
                """,
            Arguments = new KernelArguments(
                new OpenAIPromptExecutionSettings
                {
                    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
                    MaxTokens        = Math.Min(remainingBudget, 4_096),
                    Temperature      = 0.1  // Low temperature for analytical tasks
                })
        };

        // Scoped plugins - only what this agent needs
        agent.Kernel.Plugins.AddFromType<ContractPlugin>();
        agent.Kernel.Plugins.AddFromType<LegalReferencePlugin>();

        var history = new ChatHistory();
        history.AddUserMessage(
            $"Please review contract {contractId}. Additional context: {userInstruction}");

        // Stream agent responses and intermediate steps
        await foreach (var message in agent.InvokeStreamingAsync(history, ct))
        {
            // Log every tool call for audit trail
            if (message.Content?.Contains("tool_call") == true)
                await _audit.LogToolCallAsync(context, message.Content, ct);

            // Deduct token usage from budget
            if (message.Metadata?.TryGetValue("usage", out var usage) == true)
                await _budget.DeductAsync(context.TenantId, usage, ct);

            yield return AgentUpdate.Progress(message.Content ?? "");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The principle of least privilege for agents

The single most important safety rule for agent design is to give each agent only the tools it needs for its specific task. A ContractReviewAgent should not have access to the NotificationPlugin or the BillingPlugin. A CustomerSupportAgent should be able to read customer data but not write it without a human approval step in the loop.

Model this exactly like you model authorization in the rest of your system - with explicit, minimal, audited grants rather than broad access.


8. Streaming AI Responses with ASP.NET Core

LLM inference is slow. Waiting for a full response before returning anything to the client produces an experience that feels broken - the UI freezes, the user wonders if something went wrong, and the HTTP request may time out entirely on long outputs.

Streaming solves all three problems simultaneously. ASP.NET Core's IAsyncEnumerable<T> support makes streaming AI responses to clients a first-class, clean pattern.

// Minimal API endpoint - streams AI response tokens as server-sent events
app.MapPost("/api/contracts/{contractId}/review", async (
    Guid contractId,
    ReviewContractRequest request,
    ContractReviewAgent agent,
    ITokenBudgetService budget,
    ClaimsPrincipal user,
    CancellationToken ct) =>
{
    var tenantId = user.GetTenantId();

    var context = new AgentRunContext(
        TenantId:    tenantId,
        UserId:      user.GetUserId(),
        OperationId: Guid.NewGuid());

    // Return streaming response - ASP.NET Core handles chunked transfer
    return Results.Ok(StreamReviewAsync(contractId, request.Instruction, agent, context, ct));
})
.WithName("ReviewContract")
.WithOpenApi()
.RequireAuthorization("ContractReview")
.RequireRateLimiting("ai-operations");

// Local function returns IAsyncEnumerable for streaming
static async IAsyncEnumerable<ReviewChunk> StreamReviewAsync(
    Guid contractId,
    string instruction,
    ContractReviewAgent agent,
    AgentRunContext context,
    [EnumeratorCancellation] CancellationToken ct)
{
    await foreach (var update in agent.ReviewContractAsync(contractId, instruction, context, ct))
    {
        yield return new ReviewChunk(
            Type:    update.Type.ToString(),
            Content: update.Content,
            At:      DateTimeOffset.UtcNow);
    }
}

public record ReviewChunk(string Type, string Content, DateTimeOffset At);
Enter fullscreen mode Exit fullscreen mode

On the client side, whether you are using Blazor, React, or a mobile application, the experience is a smooth token-by-token stream of text appearing in real time - the same experience users have come to expect from ChatGPT and Claude.

@* Blazor component - consuming the streaming review endpoint *@
@code {
    private readonly StringBuilder _reviewBuffer = new();
    private bool _isStreaming;

    private async Task StartReviewAsync()
    {
        _isStreaming = true;
        _reviewBuffer.Clear();

        using var response = await Http.PostAsJsonAsync(
            $"/api/contracts/{ContractId}/review",
            new { Instruction = UserInstruction });

        response.EnsureSuccessStatusCode();

        await foreach (var chunk in response.Content
            .ReadFromJsonAsAsyncEnumerable<ReviewChunk>())
        {
            if (chunk is null) continue;

            _reviewBuffer.Append(chunk.Content);
            StateHasChanged();   // Re-render as each chunk arrives
        }

        _isStreaming = false;
        StateHasChanged();
    }
}
Enter fullscreen mode Exit fullscreen mode

9. Observability for AI Workloads

AI workloads introduce observability challenges that traditional APM tools are not equipped to handle out of the box. Latency, token consumption, prompt content, model version, temperature setting, and tool call sequences all need to be captured to diagnose problems and control costs.

Semantic Kernel ships with OpenTelemetry support built in. The following setup captures the metrics and traces that matter most in production.

// Program.cs - OpenTelemetry for AI workloads
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.SemanticKernel")
        .AddSource("MyApp.AI")
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter())
    .WithMetrics(metrics => metrics
        .AddMeter("Microsoft.SemanticKernel")
        .AddMeter("MyApp.AI.Tokens")
        .AddMeter("MyApp.AI.Latency")
        .AddAspNetCoreInstrumentation()
        .AddOtlpExporter());

// Custom activity source for AI-specific spans
public static class AiTelemetry
{
    public static readonly ActivitySource Source = new("MyApp.AI");

    public static Activity? StartAgentRun(
        string agentName,
        string tenantId,
        string operationId)
    {
        var activity = Source.StartActivity(
            $"agent.{agentName}",
            ActivityKind.Internal);

        activity?.SetTag("ai.agent.name",     agentName);
        activity?.SetTag("ai.tenant.id",      tenantId);
        activity?.SetTag("ai.operation.id",   operationId);
        activity?.SetTag("ai.model",          "gpt-4o");

        return activity;
    }

    public static void RecordTokenUsage(
        string tenantId,
        string operation,
        int promptTokens,
        int completionTokens)
    {
        TokenUsageCounter.Add(
            promptTokens + completionTokens,
            new TagList
            {
                { "tenant.id",  tenantId },
                { "operation",  operation },
                { "token.type", "total" }
            });
    }

    private static readonly Counter<int> TokenUsageCounter =
        new Meter("MyApp.AI.Tokens")
            .CreateCounter<int>(
                "ai.tokens.used",
                unit: "{tokens}",
                description: "Total tokens consumed per operation");
}
Enter fullscreen mode Exit fullscreen mode

The AI observability checklist

Every AI operation in production should produce traces that capture the agent or chain name, the model used and its version, the number of prompt tokens and completion tokens consumed, the total wall-clock latency, every tool call made and its result, whether the operation succeeded or was cut short by a guardrail, and the tenant ID for multi-tenant systems.

Without this data, debugging a misbehaving agent is guesswork. With it, you can answer the questions that matter: why did the agent call the wrong tool, where is the latency spike coming from, which tenant is consuming 40% of our token budget, and is the model performing worse on inputs with certain characteristics.


10. Guardrails: Safety, Cost Control, and Responsible AI

Guardrails are the engineering controls you put in place to ensure AI workloads behave within defined boundaries - for safety, cost, performance, and regulatory compliance. They are not optional. They are the difference between a feature that delights users and one that causes a production incident at 2am.

Token budget enforcement

public interface ITokenBudgetService
{
    Task<int>  GetRemainingAsync(Guid tenantId, CancellationToken ct);
    Task       DeductAsync(Guid tenantId, object usage, CancellationToken ct);
    Task<bool> TryReserveAsync(Guid tenantId, int estimatedTokens, CancellationToken ct);
}

// Infrastructure implementation backed by Redis
public sealed class RedisTokenBudgetService : ITokenBudgetService
{
    private readonly IDatabase _redis;

    public RedisTokenBudgetService(IConnectionMultiplexer redis)
        => _redis = redis.GetDatabase();

    public async Task<int> GetRemainingAsync(Guid tenantId, CancellationToken ct)
    {
        var key   = $"token_budget:{tenantId}:{GetCurrentMonth()}";
        var value = await _redis.StringGetAsync(key);
        return value.HasValue ? (int)value : GetDefaultMonthlyBudget(tenantId);
    }

    public async Task<bool> TryReserveAsync(
        Guid tenantId, int estimatedTokens, CancellationToken ct)
    {
        var remaining = await GetRemainingAsync(tenantId, ct);
        return remaining >= estimatedTokens;
    }

    private static string GetCurrentMonth()
        => DateTimeOffset.UtcNow.ToString("yyyy-MM");

    private static int GetDefaultMonthlyBudget(Guid tenantId)
        => 500_000; // 500K tokens per tenant per month on starter plan
}
Enter fullscreen mode Exit fullscreen mode

Input and output validation

Every prompt sent to an LLM should be validated before dispatch, and every response received should be validated before being shown to a user or used to trigger a domain action. This is especially important in multi-tenant systems where user-supplied content could contain prompt injection attempts.

public sealed class PromptSafetyMiddleware
{
    private static readonly string[] ForbiddenPatterns =
    [
        "ignore previous instructions",
        "you are now",
        "disregard your",
        "act as if",
        "pretend you are"
    ];

    public static bool IsSafeInput(string userInput)
    {
        var normalized = userInput.ToLowerInvariant();

        return !ForbiddenPatterns.Any(pattern =>
            normalized.Contains(pattern, StringComparison.OrdinalIgnoreCase));
    }

    public static bool IsStructuredOutputValid<T>(string llmOutput)
    {
        try
        {
            var result = JsonSerializer.Deserialize<T>(llmOutput);
            return result is not null;
        }
        catch
        {
            return false;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The human-in-the-loop pattern

For any AI action that has irreversible consequences - sending an email, updating a contract status, charging a payment method, deleting records - require explicit human approval before execution. The agent can prepare the action and present it for review. The human approves or rejects. Only then does the domain command execute.

public sealed class PendingAiAction
{
    public Guid              Id          { get; init; } = Guid.NewGuid();
    public string            Description { get; init; } = "";
    public string            CommandJson  { get; init; } = "";
    public string            CommandType  { get; init; } = "";
    public PendingActionStatus Status    { get; private set; } = PendingActionStatus.AwaitingApproval;
    public DateTimeOffset    ExpiresAt   { get; init; } = DateTimeOffset.UtcNow.AddHours(24);

    public void Approve(UserId approver)
    {
        if (Status != PendingActionStatus.AwaitingApproval)
            throw new DomainException("This action has already been resolved.");
        if (DateTimeOffset.UtcNow > ExpiresAt)
            throw new DomainException("This action request has expired.");

        Status = PendingActionStatus.Approved;
    }

    public void Reject(UserId rejector, string reason)
    {
        Status = PendingActionStatus.Rejected;
    }
}
Enter fullscreen mode Exit fullscreen mode

11. Real-World Walkthrough: AI-Native CRM Feature

Let us put the entire architecture together by designing one cohesive AI-native feature: a customer health score assistant for a B2B SaaS CRM. The assistant analyzes a customer's usage data, support history, payment history, and NPS responses, and generates a health score with a plain-English explanation and recommended actions for the account manager.

The feature requirements

The assistant must read customer data across four domains: product usage metrics, support ticket history, billing and payment records, and NPS survey responses. It must produce a structured health score from 0 to 100, a one-paragraph plain-English explanation, and up to three prioritized recommended actions. It must stream the response in real time. It must log every inference for audit. It must respect a per-tenant token budget.

The architecture

Account Manager clicks "Analyze Health"
          │
          ▼
POST /api/customers/{id}/health-analysis
          │
          ▼
HealthAnalysisEndpoint
  → Validates token budget
  → Creates AgentRunContext
  → Streams CustomerHealthAgent.AnalyzeAsync()
          │
          ▼
CustomerHealthAgent (Semantic Kernel ChatCompletionAgent)
  → Plugins: UsagePlugin, SupportPlugin, BillingPlugin, NpsPlugin
  → Strategy: Auto function calling
  → Temperature: 0.15 (analytical, low creativity)
          │
     ┌────┴────────────────────────────────────┐
     │ Tool calls (in whatever order the       │
     │ model determines is logical)            │
     ├─────────────────────────────────────────┤
     │ UsagePlugin.GetUsageMetricsAsync()      │
     │   → PostgreSQL (metrics schema)         │
     │ SupportPlugin.GetTicketHistoryAsync()   │
     │   → PostgreSQL (support schema)         │
     │ BillingPlugin.GetPaymentHistoryAsync()  │
     │   → PostgreSQL (billing schema)         │
     │ NpsPlugin.GetNpsResponsesAsync()        │
     │   → PostgreSQL (survey schema)          │
     └────┬────────────────────────────────────┘
          │
          ▼
Structured output: HealthAnalysisResult
{
  "score": 74,
  "trend": "declining",
  "explanation": "...",
  "recommended_actions": [...]
}
          │
          ▼
  → Streamed to client as JSON chunks
  → Persisted to customer record (async)
  → Logged to audit trail
  → Token usage deducted from budget
Enter fullscreen mode Exit fullscreen mode

The output schema

// Force the model to return structured output
public sealed record HealthAnalysisResult
{
    [JsonPropertyName("score")]
    [JsonDescription("Health score from 0 (critical) to 100 (excellent)")]
    public int Score { get; init; }

    [JsonPropertyName("trend")]
    [JsonDescription("Score trend: improving, stable, or declining")]
    public string Trend { get; init; } = "";

    [JsonPropertyName("explanation")]
    [JsonDescription("One paragraph plain-English explanation for the account manager")]
    public string Explanation { get; init; } = "";

    [JsonPropertyName("recommended_actions")]
    [JsonDescription("Up to 3 prioritized actions the account manager should take")]
    public IReadOnlyList<RecommendedAction> RecommendedActions { get; init; } = [];
}

public sealed record RecommendedAction(
    [property: JsonPropertyName("priority")]   int    Priority,
    [property: JsonPropertyName("action")]     string Action,
    [property: JsonPropertyName("rationale")]  string Rationale
);
Enter fullscreen mode Exit fullscreen mode

This walkthrough covers roughly 20% of the full implementation, but it demonstrates the key structural principle: the AI is a coordinator that uses your existing domain repositories through the plugin interface. The repositories do not know they are being called by an AI. The domain does not change. Only the orchestration layer is new.


12. Migration Path: Evolving an Existing .NET System

Adopting an AI-native architecture does not require a rewrite. The most effective migration path is incremental and additive - you add the AI Orchestration Layer on top of your existing Clean Architecture without disturbing the layers beneath it.

Phase 1 - Add the AI infrastructure (Week 1-2)

Install Semantic Kernel, configure your LLM provider and embedding model, and add your vector store of choice. Write the ITokenBudgetService, IAgentAuditLogger, and IContractSemanticSearch interfaces. Implement them in the Infrastructure layer. Register everything in DI. No existing code changes.

Phase 2 - Build the first plugin from an existing service (Week 2-3)

Take one existing domain service - the simplest one whose data would be useful to an AI - and wrap it in a [KernelFunction] plugin. Write clear, natural-language descriptions on every method and parameter. Write integration tests that verify the plugin calls the underlying service correctly.

Phase 3 - Build a single AI endpoint with full observability (Week 3-4)

Build one streaming AI endpoint end to end, with token budget enforcement, audit logging, OpenTelemetry traces, and a rate limiting policy specific to AI operations. This is your reference implementation - every subsequent AI feature follows the same pattern.

Phase 4 - Add semantic indexing to existing entities (Week 4-6)

Identify the entities that would most benefit from semantic search. Add the semantic metadata fields to those entities (AI summary, embedding vector, last indexed at). Build the background indexing job that populates them asynchronously via domain commands. This phase does not change any existing read or write paths.

Phase 5 - Expand plugins and agents gradually (Ongoing)

Add plugins for additional domain areas one at a time. Build agents that combine multiple plugins into coherent workflows. Each agent should be narrowly scoped, fully logged, and subject to the human-in-the-loop pattern for any irreversible actions.

Current state:
  Clean Architecture with DDD + CQRS
  PostgreSQL, Redis, blob storage
  ASP.NET Core Minimal APIs

After Phase 1:
  + Semantic Kernel registered in DI
  + LLM and embedding provider configured
  + Vector store added (pgvector or Qdrant)
  + Token budget service operational
  + Audit logger operational
  + Full OTEL tracing for AI spans

After Phase 2-3:
  + First plugin wrapping existing domain service
  + First streaming AI endpoint live
  + Rate limiting on AI operations
  + Reference implementation established

After Phase 4-6:
  + Key entities semantically indexed
  + Multiple plugins across domain areas
  + Scoped agents for key workflows
  + Human-in-the-loop for irreversible actions
  + AI-native architecture fully operational
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

Modernizing a .NET architecture for AI-native workloads is not a replacement project - it is an evolution. The Clean Architecture principles, domain-driven design, and engineering discipline that make .NET systems reliable and maintainable are exactly the foundation you need to build AI capabilities that are trustworthy in production.

The key insight is that the AI Orchestration Layer is a new architectural concern that sits between your application layer and your AI infrastructure. It does not replace your domain model - it enriches it. It does not replace your repositories - it calls them through plugins. It does not replace your observability stack - it extends it with AI-specific metrics and traces.

The patterns covered in this article - semantic plugins, vector memory, the agent pattern, streaming responses, token budget guardrails, and the human-in-the-loop - are not theoretical. They are the patterns that production AI-native .NET systems are built on today. Each one maps cleanly to the Clean Architecture and DDD vocabulary that .NET engineers already know.

The migration path is incremental. The risk is manageable. The opportunity is extraordinary.

Start with one plugin. Ship one streaming endpoint. Build from there.


Have you already integrated Semantic Kernel or another AI framework into a .NET production system? Share your experience in the comments - the patterns the community is developing in the real world are ahead of any single article.


Resources

Semantic Kernel Documentation · Semantic Kernel GitHub · pgvector for PostgreSQL · Qdrant Vector Database · Azure AI Search Hybrid Search · OpenTelemetry .NET · ASP.NET Core Streaming with IAsyncEnumerable · Microsoft Responsible AI Principles

Top comments (0)