Building a C# Agent with Microsoft Agent Framework and Ollama
Part 3 of "Running LLMs & Agents on Azure Container Apps"
We've got Ollama running in Azure Container Apps with persistent storage and secure access. Now let's write an agent that talks to it.
Two weeks ago, Microsoft shipped Agent Framework 1.0 -- the production-ready successor to both Semantic Kernel and AutoGen. Same team, dramatically simpler API. If you've been building agents with Semantic Kernel's ChatCompletionAgent and Kernel objects, the new framework strips away most of that ceremony. You get an agent in three lines of code instead of fifteen.
I rewrote my Ollama agent code the week it shipped. This post walks through what that looks like.
Why Agent Framework Over Semantic Kernel
I used Semantic Kernel for everything up until this month. It's solid, and I still have projects running on it. But Agent Framework fixes the things that always bothered me.
In Semantic Kernel, every agent needs a Kernel instance. You build a kernel, configure providers, register plugins, then pass the kernel to the agent. It's a lot of plumbing for what amounts to "talk to this model and call these functions." Agent Framework collapses that into a single extension method. You take your chat client -- whatever provider -- and call .AsAIAgent(). Done.
Tool registration is the other big improvement. Semantic Kernel requires [KernelFunction] attributes on every method, a plugin class, and a kernel to register it on. Agent Framework uses AIFunctionFactory.Create() to wrap any C# method as a tool. You pass your tools directly when you create the agent. No attributes, no plugin classes, no kernel.
The underlying model abstraction is Microsoft.Extensions.AI, which means any provider that implements IChatClient works. Ollama, Azure OpenAI, OpenAI, Anthropic -- same agent code, different client. That portability is why I chose this stack for the series.
A note on Semantic Kernel: Microsoft will keep maintaining it and fixing bugs, but new features go into Agent Framework. If you're starting fresh, start here. If you have Semantic Kernel code running in production, there's no rush to migrate -- but new projects should use the new framework.
Project Setup
dotnet new console -n OllamaAgent
cd OllamaAgent
dotnet add package Microsoft.Agents.AI --prerelease
dotnet add package Microsoft.Extensions.AI.Ollama --prerelease
Two packages. Microsoft.Agents.AI is the agent framework itself. Microsoft.Extensions.AI.Ollama is the first-party Ollama connector built on the IChatClient abstraction. Both are marked --prerelease because the NuGet packages shipped as 1.0.0-preview while the framework itself is GA. Microsoft does this sometimes. The APIs are stable.
Your First Agent
using System;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
var chatClient = new OllamaChatClient(
new Uri("https://your-ollama.azurecontainerapps.io"),
modelId: "llama3:8b");
AIAgent agent = chatClient.AsAIAgent(
instructions: "You are a helpful assistant running on self-hosted infrastructure.");
Console.WriteLine(await agent.RunAsync("What is Azure Container Apps?"));
That's the whole thing. Three meaningful lines: create a client, make it an agent, run it. The endpoint is the internal FQDN of your Ollama container app from Part 2. If your code runs in the same ACA environment (which it will in Part 4), it reaches Ollama directly over the internal network.
Compare that to the Semantic Kernel equivalent, which needs Kernel.CreateBuilder(), AddOllamaChatCompletion(), builder.Build(), then kernel.InvokePromptAsync(). Same result, twice the ceremony.
Swappable Backends
This is the pattern I use on every project. Configure a local backend for development and a cloud backend for production, and a flag decides which one runs.
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Azure.AI.OpenAI;
using Azure;
AIAgent CreateAgent(bool useLocal = false)
{
IChatClient client;
if (useLocal)
{
client = new OllamaChatClient(
new Uri(Environment.GetEnvironmentVariable("OLLAMA_URL")!),
modelId: "llama3:8b");
}
else
{
client = new AzureOpenAIClient(
new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
new AzureKeyCredential(
Environment.GetEnvironmentVariable("AZURE_OPENAI_KEY")!))
.GetChatClient("gpt-4");
}
return client.AsAIAgent(
instructions: "You are a helpful technical assistant.");
}
In development, useLocal is true. In production, it's false. Your agent instructions, tools, and orchestration stay identical. You're only changing the inference backend.
This pays off in ways beyond the obvious cost savings. You can run your full test suite against a local model in CI/CD without API charges. You can demo at a conference or customer site without depending on network connectivity. I've done both.
Multi-Turn Conversations
Agent Framework introduces sessions for managing conversation state. Each session tracks its own message history.
var agent = chatClient.AsAIAgent(
instructions: "You are a technical advisor for Azure deployments.");
// Create a session for a multi-turn conversation
var session = await agent.CreateSessionAsync();
// First turn
var response1 = await agent.RunAsync(
"I need to deploy a containerized ML model on Azure.", session);
Console.WriteLine(response1);
// Second turn -- the agent remembers the context
var response2 = await agent.RunAsync(
"What about GPU support?", session);
Console.WriteLine(response2);
The session handles all the chat history management. In Semantic Kernel, you'd create a ChatHistory object, manually append messages, and pass it around. Here, the session does that behind the scenes. You can also serialize sessions to JSON for persistence, which is useful when you need conversations that survive container restarts.
Adding Tools (Function Calling)
Tools are where agents stop being chatbots and start doing useful work. Agent Framework makes tool registration dead simple compared to Semantic Kernel.
using System.ComponentModel;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
// Just a regular C# method -- no [KernelFunction] attribute needed
[Description("Gets the current weather for a location")]
static string GetWeather(string location)
{
// In production, this calls a weather API
return $"Weather in {location}: 72°F, Sunny";
}
[Description("Looks up an Azure resource's current status")]
static string CheckResourceStatus(string resourceName)
{
return $"{resourceName}: Running, 0 errors in last 24h";
}
var agent = chatClient.AsAIAgent(
instructions: "You are an operations assistant with access to weather and Azure monitoring tools.",
tools: [
AIFunctionFactory.Create(GetWeather),
AIFunctionFactory.Create(CheckResourceStatus)
]);
var response = await agent.RunAsync("What's the weather in Seattle and is my ollama-prod app healthy?");
Console.WriteLine(response);
Notice what's missing: no plugin class, no kernel, no FunctionChoiceBehavior settings. You pass your tools as a list when you create the agent, and the framework handles the rest. The [Description] attribute is optional but I always include it -- it's what the LLM reads to decide whether to call the function. A good description is the difference between the model calling your function correctly and ignoring it entirely.
In Semantic Kernel, the same setup requires creating a plugin class with [KernelFunction] attributes, building a kernel, registering the plugin on the kernel, configuring FunctionChoiceBehavior.Auto() in execution settings, and then invoking. Agent Framework gets the same result with half the code and no framework-specific attributes on your business logic.
Function Calling with Local Models: What Actually Works
Function calling with self-hosted models is not as reliable as with GPT-4. It works, but you need to pick the right models. I've burned enough time on this to have opinions.
Llama 3.1 and later have solid function calling support. If you're on Llama 3 (without the .1), function calling will be flaky -- the model wasn't trained for tool use. This is the number one issue I see people hit.
Mistral and Mixtral handle tool use well. They're my go-to when you need function calling on Ollama at a smaller size than Llama 3.1 70B.
Qwen 2.5 is strong on structured output and function calling, especially the 7B and 14B sizes. It's become my default for agents that need reliable tool use on modest hardware.
Practical advice: write an integration test that sends a prompt requiring a function call and verifies the function actually fired. Takes five minutes, saves hours.
// Quick smoke test for function calling support
[Description("Returns the current UTC time")]
static string GetTime() => DateTime.UtcNow.ToString("o");
var testAgent = chatClient.AsAIAgent(
instructions: "Use the GetTime tool to answer time questions.",
tools: [AIFunctionFactory.Create(GetTime)]);
var result = await testAgent.RunAsync("What time is it?");
// If the response contains a real timestamp, function calling works
Console.WriteLine(result);
Run that against each model you're evaluating. If it returns something like "I don't have access to real-time information" instead of an actual timestamp, that model can't do tool use.
Smart Routing: Right Model for the Job
Once you have both backends available, you can route requests to the model that fits the task.
public class SmartRouter
{
private readonly AIAgent _localAgent;
private readonly AIAgent _cloudAgent;
public SmartRouter(string ollamaUrl, string azureEndpoint, string azureKey)
{
var localClient = new OllamaChatClient(
new Uri(ollamaUrl), modelId: "qwen2.5:14b");
var cloudClient = new AzureOpenAIClient(
new Uri(azureEndpoint),
new AzureKeyCredential(azureKey))
.GetChatClient("gpt-4");
_localAgent = localClient.AsAIAgent(
instructions: "You are a data processing assistant.");
_cloudAgent = cloudClient.AsAIAgent(
instructions: "You are an expert analyst and writer.");
}
public Task<AgentResponse> RouteAsync(string input, string taskType)
{
var agent = taskType switch
{
"classify" or "extract" or "summarize" => _localAgent,
"reason" or "analyze" or "generate" => _cloudAgent,
_ => _localAgent
};
return agent.RunAsync(input);
}
}
Local models handle classification, extraction, and summarization almost as well as GPT-4 -- well enough for production. Where GPT-4 still pulls ahead is multi-step reasoning, complex code generation, and text that needs a specific tone. A routing layer like this cuts API costs by 60-80% without a noticeable quality drop.
Complete Example: Document Triage Agent
Here's something closer to what I've built for real teams -- an agent that triages incoming documents, classifies them, extracts key fields, and routes them for review.
using System.ComponentModel;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
// Tool functions -- clean C# methods, no framework attributes required
[Description("Classifies a document as: invoice, contract, support-ticket, or other")]
static string ClassifyDocument(string content)
{
// In production: fine-tuned classifier or pattern matching
return "invoice";
}
[Description("Extracts vendor name, amount, and due date from an invoice")]
static string ExtractInvoiceFields(string content)
{
return """{"vendor": "Contoso", "amount": 4250.00, "due": "2026-05-15"}""";
}
[Description("Routes a document to a review queue based on category and priority")]
static string RouteForReview(string category, string priority)
{
return $"Routed {category} to {priority}-priority queue";
}
// Create the agent with Qwen 2.5 14B -- reliable tool use, runs on CPU
var chatClient = new OllamaChatClient(
new Uri("https://your-ollama.azurecontainerapps.io"),
modelId: "qwen2.5:14b");
var triageAgent = chatClient.AsAIAgent(
instructions: """
You are a document triage agent. When given a document:
1. Classify its type
2. If it's an invoice, extract the key fields
3. Route it for review based on category and urgency
Use the available tools for each step.
""",
tools: [
AIFunctionFactory.Create(ClassifyDocument),
AIFunctionFactory.Create(ExtractInvoiceFields),
AIFunctionFactory.Create(RouteForReview)
]);
var session = await triageAgent.CreateSessionAsync();
var result = await triageAgent.RunAsync(
"Process this document: Invoice from Contoso for $4,250 due May 15, 2026 for Azure consulting services.",
session);
Console.WriteLine(result);
I'm using qwen2.5:14b because it chains multiple tool calls reliably -- classify, then extract, then route -- without dropping steps. It's small enough to run on CPU without painful latency. Llama 3 can't do this sequence consistently; Qwen 2.5 nails it.
This is a single-agent setup. In Part 4, we'll break this apart -- a classifier agent, an extraction agent, a routing agent -- each running as its own container on ACA, communicating through Dapr, with Dynamic Sessions for sandboxed code execution.
What Changed from Semantic Kernel (Quick Reference)
If you've been following this series and have Semantic Kernel code, here's what moves where:
| Semantic Kernel | Agent Framework |
|---|---|
Kernel.CreateBuilder() |
new OllamaChatClient(...) |
builder.AddOllamaChatCompletion(...) |
(done in client constructor) |
kernel.InvokePromptAsync(...) |
agent.RunAsync(...) |
[KernelFunction] attribute |
AIFunctionFactory.Create(method) |
builder.Plugins.AddFromType<T>() |
tools: [...] parameter |
FunctionChoiceBehavior.Auto() |
(automatic -- no config needed) |
ChatHistory |
AgentSession |
Microsoft.SemanticKernel namespace |
Microsoft.Agents.AI + Microsoft.Extensions.AI
|
The Semantic Kernel packages still work. If you have production code on them, there's no fire to put out. But for new projects, Agent Framework is less code, less ceremony, and where Microsoft is putting new features.
Next Up
Part 4 is where the architecture gets interesting: multiple agents running as separate containers on ACA, passing messages through Dapr, with Azure Container Apps Dynamic Sessions for sandboxed code execution. We go from "one agent that triages documents" to "a team of agents that can research, code, and review."
Questions about migrating from Semantic Kernel or getting Ollama working with Agent Framework? Drop them in the comments -- I migrated a project last week and the gotchas are fresh.
Top comments (0)