DEV Community

Brian Spann
Brian Spann

Posted on

Generative AI in C#: Building Provider-Agnostic LLM Applications with Microsoft.Extensions.AI

Building LLM-powered applications in C# has never been more accessible—or more fragmented. With Azure OpenAI, OpenAI, Anthropic, Google, Ollama, and countless other providers, developers face a familiar problem: vendor lock-in at the SDK level.

You start with OpenAI. Things go well. Then your team decides to try Azure OpenAI for compliance reasons. Or you want to run Llama 3 locally during development to save costs. Suddenly, you're refactoring service classes, updating DI registrations, and maintaining multiple code paths.

This is exactly the problem Microsoft.Extensions.AI was designed to solve.

What is Microsoft.Extensions.AI?

Microsoft.Extensions.AI (released as stable in early 2025) provides a unified abstraction layer for AI services in .NET. Think of it as what ILogger did for logging or IDistributedCache did for caching—but for AI.

At its core, the library defines two primary interfaces:

  • IChatClient: For conversational AI (chat completions)
  • IEmbeddingGenerator<TInput, TEmbedding>: For generating embeddings

These interfaces are provider-agnostic. Your application code depends on the abstraction, while the concrete implementation can be swapped at configuration time.

public interface IChatClient : IDisposable
{
    Task<ChatCompletion> CompleteAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default);

    ChatClientMetadata Metadata { get; }
    TService? GetService<TService>(object? key = null) where TService : class;
}
Enter fullscreen mode Exit fullscreen mode

The Problem with Provider-Specific SDKs

Let's look at what happens without an abstraction layer. Here's typical code using the Azure OpenAI SDK directly:

public class ContentService
{
    private readonly AzureOpenAIClient _client;
    private readonly string _deploymentName;

    public ContentService(AzureOpenAIClient client, string deploymentName)
    {
        _client = client;
        _deploymentName = deploymentName;
    }

    public async Task<string> SummarizeAsync(string content)
    {
        var chatClient = _client.GetChatClient(_deploymentName);

        var messages = new List<ChatMessage>
        {
            new SystemChatMessage("You are a helpful summarizer."),
            new UserChatMessage($"Summarize this:\n\n{content}")
        };

        ChatCompletion completion = await chatClient.CompleteChatAsync(messages);
        return completion.Content[0].Text;
    }
}
Enter fullscreen mode Exit fullscreen mode

This works fine—until requirements change:

  1. Testing becomes painful. The AzureOpenAIClient is a concrete class. You need to mock HTTP responses or use integration tests exclusively.
  2. Local development is expensive. Every test run, every debugging session hits the Azure API and costs money.
  3. Switching providers requires code changes. Want to try Claude or Gemini? Time to refactor.

Building with Microsoft.Extensions.AI

Here's the same service written against the abstraction:

public class ContentService
{
    private readonly IChatClient _chatClient;

    public ContentService(IChatClient chatClient)
    {
        _chatClient = chatClient;
    }

    public async Task<string> SummarizeAsync(string content)
    {
        var messages = new List<ChatMessage>
        {
            new(ChatRole.System, "You are a helpful summarizer."),
            new(ChatRole.User, $"Summarize this:\n\n{content}")
        };

        var result = await _chatClient.CompleteAsync(messages);
        return result.Message.Text ?? string.Empty;
    }
}
Enter fullscreen mode Exit fullscreen mode

Notice what's not in this code:

  • No Azure-specific types
  • No deployment names
  • No provider-specific configuration

The ContentService now depends only on IChatClient. How that interface is implemented is a concern of the composition root—not the service.

Provider Registration Patterns

The real power emerges in how you configure providers. Let's look at several registration patterns.

Azure OpenAI

using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;

builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();

    var client = new AzureOpenAIClient(
        new Uri(config["AzureOpenAI:Endpoint"]!),
        new DefaultAzureCredential());

    return client.AsChatClient(config["AzureOpenAI:DeploymentName"]!);
});
Enter fullscreen mode Exit fullscreen mode

OpenAI

using OpenAI;
using Microsoft.Extensions.AI;

builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();

    var client = new OpenAIClient(config["OpenAI:ApiKey"]!);
    return client.AsChatClient("gpt-4o");
});
Enter fullscreen mode Exit fullscreen mode

Ollama (Local Development)

using Microsoft.Extensions.AI.Ollama;

builder.Services.AddChatClient(sp =>
    new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.2"));
Enter fullscreen mode Exit fullscreen mode

GitHub Models (Great for Testing)

using Azure.AI.OpenAI;
using Microsoft.Extensions.AI;

builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();

    // GitHub Models uses OpenAI-compatible API
    var client = new OpenAIClient(
        new ApiKeyCredential(config["GitHub:Token"]!),
        new OpenAIClientOptions 
        { 
            Endpoint = new Uri("https://models.inference.ai.azure.com") 
        });

    return client.AsChatClient("gpt-4o");
});
Enter fullscreen mode Exit fullscreen mode

Configuration-Driven Provider Selection

For maximum flexibility, drive provider selection from configuration:

// appsettings.json
{
  "AI": {
    "Provider": "AzureOpenAI",
    "AzureOpenAI": {
      "Endpoint": "https://myresource.openai.azure.com",
      "DeploymentName": "gpt-4o"
    },
    "OpenAI": {
      "ApiKey": "sk-..."
    },
    "Ollama": {
      "Endpoint": "http://localhost:11434",
      "ModelId": "llama3.2"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode
builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();
    var provider = config["AI:Provider"];

    return provider switch
    {
        "AzureOpenAI" => CreateAzureOpenAIClient(config),
        "OpenAI" => CreateOpenAIClient(config),
        "Ollama" => CreateOllamaClient(config),
        "GitHubModels" => CreateGitHubModelsClient(config),
        _ => throw new InvalidOperationException($"Unknown AI provider: {provider}")
    };
});

IChatClient CreateAzureOpenAIClient(IConfiguration config)
{
    var section = config.GetSection("AI:AzureOpenAI");
    var client = new AzureOpenAIClient(
        new Uri(section["Endpoint"]!),
        new DefaultAzureCredential());
    return client.AsChatClient(section["DeploymentName"]!);
}

IChatClient CreateOpenAIClient(IConfiguration config)
{
    var section = config.GetSection("AI:OpenAI");
    var client = new OpenAIClient(section["ApiKey"]!);
    return client.AsChatClient(section["ModelId"] ?? "gpt-4o");
}

IChatClient CreateOllamaClient(IConfiguration config)
{
    var section = config.GetSection("AI:Ollama");
    return new OllamaChatClient(
        new Uri(section["Endpoint"] ?? "http://localhost:11434"),
        section["ModelId"] ?? "llama3.2");
}
Enter fullscreen mode Exit fullscreen mode

This pattern lets you:

  • Use Ollama locally (free, fast, private)
  • Use OpenAI in staging
  • Use Azure OpenAI in production
  • Switch with a single config change

Middleware and Decorators

One of the most powerful features of Extensions.AI is the middleware pattern. Middleware wraps your chat client to add cross-cutting concerns without modifying your application code.

Using the Builder Pattern

builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();

    return new AzureOpenAIClient(
            new Uri(config["AzureOpenAI:Endpoint"]!),
            new DefaultAzureCredential())
        .AsChatClient(config["AzureOpenAI:DeploymentName"]!)
        .AsBuilder()
        .UseLogging(sp.GetRequiredService<ILoggerFactory>())
        .UseDistributedCache(sp.GetRequiredService<IDistributedCache>())
        .Build(sp);
});
Enter fullscreen mode Exit fullscreen mode

Built-in Middleware

Extensions.AI includes several middleware components:

Logging Middleware logs all completions with structured data:

.UseLogging(loggerFactory)
Enter fullscreen mode Exit fullscreen mode

Distributed Caching caches responses to reduce costs for repeated queries:

.UseDistributedCache(distributedCache, options =>
{
    options.CacheExpiration = TimeSpan.FromHours(1);
})
Enter fullscreen mode Exit fullscreen mode

OpenTelemetry adds tracing spans for observability:

.UseOpenTelemetry(loggerFactory, sourceName: "AI.Chat")
Enter fullscreen mode Exit fullscreen mode

Custom Middleware

You can create custom middleware for any cross-cutting concern:

public class RateLimitingMiddleware : DelegatingChatClient
{
    private readonly RateLimiter _limiter;

    public RateLimitingMiddleware(IChatClient inner, RateLimiter limiter) 
        : base(inner)
    {
        _limiter = limiter;
    }

    public override async Task<ChatCompletion> CompleteAsync(
        IList<ChatMessage> chatMessages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        using var lease = await _limiter.AcquireAsync(1, cancellationToken);

        if (!lease.IsAcquired)
            throw new RateLimitExceededException("Rate limit exceeded");

        return await base.CompleteAsync(chatMessages, options, cancellationToken);
    }
}

// Extension method for fluent registration
public static class RateLimitingExtensions
{
    public static ChatClientBuilder UseRateLimiting(
        this ChatClientBuilder builder, 
        RateLimiter limiter)
    {
        return builder.Use((inner, sp) => 
            new RateLimitingMiddleware(inner, limiter));
    }
}
Enter fullscreen mode Exit fullscreen mode

Embedding Generation

The same patterns apply to embeddings. Here's how to register an embedding generator:

builder.Services.AddEmbeddingGenerator<string, Embedding<float>>(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();

    var client = new AzureOpenAIClient(
        new Uri(config["AzureOpenAI:Endpoint"]!),
        new DefaultAzureCredential());

    return client.AsEmbeddingGenerator(config["AzureOpenAI:EmbeddingModel"]!);
});
Enter fullscreen mode Exit fullscreen mode

Usage is straightforward:

public class SearchService
{
    private readonly IEmbeddingGenerator<string, Embedding<float>> _embedder;

    public SearchService(IEmbeddingGenerator<string, Embedding<float>> embedder)
    {
        _embedder = embedder;
    }

    public async Task<float[]> GetEmbeddingAsync(string text)
    {
        var embedding = await _embedder.GenerateAsync(text);
        return embedding.Vector.ToArray();
    }

    public async Task<IReadOnlyList<Embedding<float>>> GetEmbeddingsAsync(
        IEnumerable<string> texts)
    {
        return await _embedder.GenerateAsync(texts.ToList());
    }
}
Enter fullscreen mode Exit fullscreen mode

When to Use Extensions.AI vs. Raw SDKs

The abstraction isn't always the right choice. Here's a decision framework:

Use Microsoft.Extensions.AI when:

  • You want provider flexibility
  • You're building a library or reusable component
  • You want middleware for logging, caching, rate limiting
  • You prioritize testability
  • Your application may run in multiple environments

Consider raw SDKs when:

  • You need provider-specific features not in the abstraction
  • You're doing low-level optimizations (custom HTTP handlers, etc.)
  • You're prototyping and speed matters more than architecture
  • You're absolutely certain you'll never switch providers

Hybrid approach: You can always escape the abstraction using GetService<T>():

public async Task DoProviderSpecificThingAsync()
{
    // Get the underlying provider if needed
    var azureClient = _chatClient.GetService<AzureOpenAIClient>();

    if (azureClient != null)
    {
        // Do Azure-specific thing
    }
}
Enter fullscreen mode Exit fullscreen mode

NuGet Packages

Here are the packages you'll need:

<!-- Core abstractions -->
<PackageReference Include="Microsoft.Extensions.AI.Abstractions" Version="9.0.0" />
<PackageReference Include="Microsoft.Extensions.AI" Version="9.0.0" />

<!-- Provider implementations -->
<PackageReference Include="Microsoft.Extensions.AI.OpenAI" Version="9.0.0" />
<PackageReference Include="Microsoft.Extensions.AI.Ollama" Version="9.0.0" />

<!-- Azure OpenAI (uses OpenAI implementation) -->
<PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" />
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

Here's a complete Program.cs showing a production-ready setup:

using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Extensions.AI;

var builder = WebApplication.CreateBuilder(args);

// Register chat client with middleware stack
builder.Services.AddChatClient(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();
    var provider = config["AI:Provider"] ?? "Ollama";

    IChatClient baseClient = provider switch
    {
        "AzureOpenAI" => CreateAzureClient(config),
        "OpenAI" => CreateOpenAIClient(config),
        _ => new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.2")
    };

    return baseClient
        .AsBuilder()
        .UseLogging(sp.GetRequiredService<ILoggerFactory>())
        .UseDistributedCache(sp.GetRequiredService<IDistributedCache>())
        .UseOpenTelemetry(sourceName: "App.AI")
        .Build(sp);
});

// Register application services
builder.Services.AddScoped<ContentService>();
builder.Services.AddDistributedMemoryCache();

var app = builder.Build();

// Simple endpoint using the abstraction
app.MapPost("/summarize", async (string content, ContentService svc) =>
{
    var summary = await svc.SummarizeAsync(content);
    return Results.Ok(new { summary });
});

app.Run();

IChatClient CreateAzureClient(IConfiguration config)
{
    var client = new AzureOpenAIClient(
        new Uri(config["AI:AzureOpenAI:Endpoint"]!),
        new DefaultAzureCredential());
    return client.AsChatClient(config["AI:AzureOpenAI:DeploymentName"]!);
}

IChatClient CreateOpenAIClient(IConfiguration config)
{
    var client = new OpenAIClient(config["AI:OpenAI:ApiKey"]!);
    return client.AsChatClient("gpt-4o");
}
Enter fullscreen mode Exit fullscreen mode

What's Next

In Part 2, we'll explore function calling and structured outputs—turning LLMs from text generators into reliable decision engines that return structured data you can actually use.

We'll cover:

  • Defining functions with attributes
  • The function-calling loop
  • JSON mode and schema generation
  • Validation and retry patterns
  • Token-efficient function definitions

This is Part 1 of the "Generative AI Patterns in C#" series. Subscribe to follow along as we build production-ready AI applications.

Top comments (0)