DEV Community: Lukas Walter

Extending .NET Agents with MCP and Agent Skills

Lukas Walter — Mon, 18 May 2026 13:30:00 +0000

This is Part 9 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

In the previous articles, we moved from simple agents to tools, dependency injection, and structured output.
Now, the agent can already do useful work inside an application.
But it still depends on the capabilities you explicitly wire into that application.
That is fine for domain logic you own.
It's less attractive when the capability already exists somewhere else: a file system, a documentation server, an internal workflow system, or a set of reusable team procedures.

Two extension models become interesting:

MCP tools connect an agent to external capabilities through the Model Context Protocol.
Agent Skills package reusable instructions, resources, and scripts that an agent can load only when needed.

Both models extend agent capabilities, though they address different needs.
They solve different problems.

MCP Is for External Capabilities

Model Context Protocol is useful when the agent needs to interact with a system outside your application boundary.
Instead of writing a custom C# wrapper for every external API, you connect to an MCP server and expose the tools from that server to the agent.
The server owns the integration logic.
Your application decides whether that server is allowed into the agent runtime.

MCP can expose more than tools, including resources and prompts, depending on the server and client.
This article focuses on MCP tools because they are the most direct fit for agent tool calling in this part of the series.
Conceptually, the flow looks like this:

Microsoft Agent Framework can work with the official MCP C# SDK.
A typical setup starts by creating an MCP client, listing the tools exposed by the server, and passing those tools to the agent.
For example, imagine a local read-only documentation MCP server that exposes project docs and an engineering handbook.
The exact command depends on the MCP server you use or build.
The important part is that the server is configured as read-only before its tools are exposed to the model.

using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using ModelContextProtocol.Client;

await using var mcpClient = await McpClientFactory.CreateAsync(
    new StdioClientTransport(new()
    {
        Name = "EngineeringDocs",
        Command = "dotnet",
        Arguments =
        [
            "run",
            "--project",
            "./tools/DocsMcpServer",
            "--",
            "--root", "./docs",
            "--root", "./engineering-handbook",
            "--read-only"
        ]
    }));

var mcpTools = await mcpClient.ListToolsAsync();

AIAgent agent = chatClient.AsAIAgent(
    instructions: """
    You answer questions about this project's engineering documentation.
    Use the documentation tools when current project guidance is needed.
    """,
    tools: [.. mcpTools.Cast<AITool>()]);

The DocsMcpServer name in this example is intentionally generic.
It could be your own small MCP server, an approved internal server, or a stable server package your team has reviewed.
The point is the MCP boundary.

The agent does not know how to read the documentation store directly.
It sees a set of tool definitions exposed by the MCP server.
When the model chooses one of those tools, your application routes the call through the MCP client.

MCP avoids hand-written custom API wrappers for every external system, but it does not remove application code.
You still own client setup, authentication, tool discovery, tool selection, logging, and safety boundaries.

MCP Is Still a Trust Boundary

MCP makes integrations easier.
It does not make them automatically safe.
An MCP server can expose powerful operations:

searching document stores
reading internal handbooks
creating tickets
changing files
querying databases
calling internal APIs
accessing local files
executing commands, depending on the server

So treat an MCP server like any other integration dependency.
Do not connect random servers to production agents, and do not assume that the protocol itself is the safety layer.
At minimum, check:

Who maintains the server
What tools does it expose
What credentials it receives
What data leaves your application
Whether tool calls are logged
Whether write operations need approval
Whether the server runs locally, remotely, or inside a sandbox

Authentication also belongs outside the prompt.
Do not put API keys, personal access tokens, or OAuth tokens into agent instructions.
Use the authentication mechanism expected by the MCP server and transport.
For remote HTTP servers, prefer per-run headers or runtime credential providers when available, so secrets are not baked into a shared client or accidentally persisted.

Keep the MCP Tool Surface Small

MCP servers can expose many tools.
That does not mean every agent should receive all of them.
A large tool surface creates three practical problems:

More tool descriptions are sent to the model, increasing token usage.
The model has more opportunities to choose the wrong tool.
Your review surface grows because every exposed tool becomes callable through the agent.

The same rule from local function tools applies here:

Expose the narrowest capability set that solves the task.

Prefer filtering before the tools ever reach the model.
Client-side filtering is useful, but it should not be the only safety boundary.
If the MCP server supports read-only mode, toolsets, scopes, explicit tool configuration, or server-side restrictions, use those first.
That keeps dangerous operations entirely out of the advertised tool list.

For example, a documentation assistant may need tools that search docs, read files under approved directories, or return links to handbook pages.
It should not receive tools that write files, shell out to the host, or read arbitrary paths outside the configured documentation roots.

Client-side allow-listing can be a second boundary after the server has already been configured safely.
Use explicit configuration or metadata when possible.

This name-based example is illustrative only:

var allowedToolNames = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
    "search_docs",
    "read_doc",
    "list_doc_sections"
};

var selectedTools = mcpTools
    .Where(tool => allowedToolNames.Contains(tool.Name))
    .Cast<AITool>()
    .ToArray();

AIAgent docsAgent = chatClient.AsAIAgent(
    instructions: "Use only the approved documentation tools.",
    tools: selectedTools);

Do not treat naming conventions as a production security model.
They are too easy to get wrong.
In a real system, prefer server-side restrictions, explicit allow-lists, scopes, policy configuration, and audit logs.
Expose tools intentionally to ensure agent access stays controlled.

Agent Skills Are for Reusable Knowledge and Procedures

MCP is a good fit when the agent needs to call an external system.
Agent Skills are a better fit when the agent needs reusable knowledge or a repeatable procedure.

Agent Framework can support file-based skills and other authoring styles, such as code-defined or class-based skills.
This article focuses on the file-based Agent Skills format because it maps well to reusable instructions, reference material, and scripts.

In the file-based Agent Skills format, a skill is a folder with a SKILL.md file and optional resources.
For example:

skills/
  incident-triage/
    SKILL.md
    references/
      severity-levels.md
      escalation-policy.md
    scripts/
      summarize-logs.py
  pull-request-review/
    SKILL.md

The SKILL.md file contains front matter and instructions:

---
name: incident-triage
description: "Guides incident triage, severity classification, escalation, and concise incident summaries."
---

# Incident Triage

Use this skill when the user asks for help with a production incident,
alert investigation, severity classification, escalation, or incident summary.

When triaging an incident:

1. Identify affected services, users, and time window.
2. Classify severity using `references/severity-levels.md`.
3. Check escalation rules in `references/escalation-policy.md`.
4. Summarize known facts, unknowns, impact, and next actions.
5. Use `scripts/summarize-logs.py` only when log excerpts need deterministic preprocessing.

This is different from putting all of that text into the agent’s system prompt.
The skill can be advertised by name and description first.
The full instructions and reference files are loaded only when the task needs them.
That keeps the base prompt smaller while still giving the agent access to deeper domain knowledge.

Loading Skills in Agent Framework

Agent Framework exposes Agent Skills through an AgentSkillsProvider.
It acts as an AIContextProvider, so skills become part of the agent invocation pipeline rather than a one-off prompt trick.
A simple file-based setup looks like this:

using Microsoft.Agents.AI;

var skillsProvider = new AgentSkillsProvider(
    Path.Combine(AppContext.BaseDirectory, "skills"));

var agentOptions = new ChatClientAgentOptions
{
    ChatOptions = new()
    {
        Instructions = """
        You help engineers triage incidents and follow internal procedures.
        Use available skills when they are relevant.
        """
    },
    AIContextProviders = [skillsProvider]
};

AIAgent agent = chatClient.AsAIAgent(agentOptions);

The provider discovers skills from the configured directory and exposes skill-related tools to the agent.
The model can then load the right skill when a user request matches the skill description.

This gives you a useful separation:

The agent instructions define the general behavior.
Skills provide specialized procedures and checklists.
Reference files hold longer policy or domain material.
Scripts can automate deterministic helper steps when you explicitly enable them.

Scripts Need Even More Care

Skills can include scripts.
That is useful, but it changes the risk profile.
Reading a markdown reference file is one thing.
Executing a script is another.
If you enable file-based script execution, do it explicitly and treat scripts as code that runs in your environment.
For example, if your application provides a subprocess runner:

var skillsProvider = new AgentSkillsProvider(
    Path.Combine(AppContext.BaseDirectory, "skills"),
    SubprocessScriptRunner.RunAsync);

That makes script execution possible.
It does not make every script acceptable for production.

Before enabling scripts, decide:

Which script extensions are allowed
Whether scripts need human approval
What filesystem paths can they access
Whether they can use the network
How long can they run
Where stdout, stderr, and exit codes are logged
How arguments are validated before execution

For internal skills, store them in version control and review them as you would application code.
For third-party skills, treat them like dependencies that can inject instructions and run code.

MCP vs. Agent Skills

MCP is access to an external system or live state.
Agent Skills are reusable procedures, guidance, and packaged expertise.

Use MCP when	Use Agent Skills when
The agent needs to call an external system	The agent needs reusable instructions or procedures
The capability already exists behind an API, service, or local server	The capability is mostly knowledge, process, examples, or local resources
The tool result should come from current external state	The agent should load guidance only when relevant
Authentication, permissions, and transport matter	Packaging, reuse, and progressive disclosure matter

There is overlap.
You can use both.
For example, an HR assistant might use:

MCP tools to query the HR system
Agent Skills to load the company’s parental leave procedure
structured output to return a validated case summary
approval tools before submitting a request

A combination is often more useful than trying to make one abstraction do everything.

When to Use Them

Use MCP when:

The agent needs live data from another system
An existing MCP server already covers the integration
You can restrict credentials and tool permissions clearly
You need a standardized integration surface across agents

Do not use MCP when:

A simple C# function is enough
The server exposes broad write operations you cannot control
You cannot audit what data leaves your application
The integration would bypass your existing authorization model

Use Agent Skills when:

Domain knowledge is too large for the base prompt
Multiple agents or teams should reuse the same procedure
The agent should load detailed guidance only when needed
Instructions, examples, templates, and scripts should live together

Do not use Agent Skills when:

The content is really application state that should come from a database
The procedure changes on every request
The skill would hide risky automation inside a markdown folder
The same result is better expressed as normal tested C# code

Conclusion

MCP and Agent Skills both extend an agent, but in different directions.
MCP connects the agent to external capabilities.
Agent Skills give the agent reusable expertise and procedures.
The problem is not about giving the agent more power.
It is about deciding which power belongs in the runtime, which belongs in application code, which belongs in a skill, and which needs approval before it runs.

At this point in the series, we have an agent that can keep state, manage context, call tools, return structured output, connect through MCP, and load reusable skills.
The next step is orchestration.
Some tasks are too large or too explicit for a single agent call.
In the next article, we will look at multi-agent systems and workflows.

Structured Output in .NET Agents

Lukas Walter — Thu, 14 May 2026 13:30:00 +0000

This is Part 8 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

LLMs are good at generating text. But text is a weak boundary for application code.

Ask a model for e.g., a specific coffee recipe, and the response might look slightly different every time:

a markdown list
a numbered list
bold section titles
missing fields
additional explanations
a disclaimer at the end

That is fine for a chat interface.
It is not, when your application needs to save the result, display it on a UI, route a workflow, or pass the output into another system.

At that point, you do not want “some text”.
You want data with a known shape.

Raw LLM Text Is Hard to Automate

The problem with unstructured output is not that it looks messy.
The problem is that your application has to guess what the model meant.

For example, if the model returns a coffee recipe as plain text, your code may need to extract:

brew method
coffee dose
water amount
grind size
water temperature
brewing steps

That usually means parsing strings.
And string parsing breaks easily.

One response might look like this:

1. V60 recipe: Use 20g coffee and 320g water at 94°C. Grind medium-fine.

The next response might look like this:

### V60 Pour-Over

- Coffee: 20 grams
- Water: 320 grams
- Temperature: 94°C
- Grind: medium-fine

Both are readable for humans.
But for software, they are different formats.
This is why raw LLM text is a fragile integration boundary.

Define the Output Shape in C

Instead of asking the model to return free-form text, you can define the shape you expect in C#.

For example:

public sealed class BrewRecipeSuggestion
{
    public string BrewMethod { get; set; } = string.Empty;
    public double CoffeeGrams { get; set; }
    public double WaterGrams { get; set; }
    public string GrindSize { get; set; } = string.Empty;
    public double WaterTemperatureCelsius { get; set; }
    public List<string> Steps { get; set; } = [];
}

If you want multiple results, you can wrap the list in a response type:

public sealed class BrewRecipeResult
{
    public List<BrewRecipeSuggestion> Recipes { get; set; } = [];
}

Now your application has a contract.
The model is no longer just asked to “write an answer”.
It is requested that something be produced that can be represented as a known C# type.

Using `RunAsync<T>`

With .NET agents, this becomes much cleaner.
Instead of calling the agent and receiving plain text:

var response = await agent.RunAsync(
    "Give me three pour-over coffee recipes.");

You can request a typed result:

AgentResponse<BrewRecipeResult> response =
    await agent.RunAsync<BrewRecipeResult>(
        """
        Give me three pour-over coffee recipes.

        Include:
        - brew method
        - coffee dose in grams
        - water amount in grams
        - grind size
        - water temperature in Celsius
        - brewing steps
        """);

BrewRecipeResult result = response.Result;

The important difference is the boundary.
Your application does not receive a string that it still has to interpret.
It receives an AgentResponse<T>, and the typed result is available through response.Result.

That means you can work with the result directly:

foreach (var recipe in result.Recipes)
{
    Console.WriteLine(
        $"{recipe.BrewMethod}: {recipe.CoffeeGrams}g coffee, " +
        $"{recipe.WaterGrams}g water");
}

This is much easier to use in normal application code.
You can render it in a UI, store it in a database, pass it to another service,validate it and even test it.

What the Framework Does for You

When you call RunAsync<T>, the framework can use the target C# type to describe the expected response shape.
The model is guided toward returning data that matches that structure.
The framework then converts the response into the requested C# type.

Conceptually, the flow looks like this:

C# type
   ↓
Expected response shape
   ↓
Model response
   ↓
Deserialization
   ↓
Typed C# object

That removes a lot of boilerplate.
You do not have to manually inspect markdown, split strings, or have to search for labels in generated text.
You get a typed result that fits into the rest of your .NET code.

Still, keep one thing in mind:

Structured output support can vary by agent type, provider, model, and underlying chat client.
So this is not a reason to stop thinking about validation, fallbacks, and testing.

It is a better application boundary.
Not a replacement for engineering discipline.

What Structured Output Does Not Solve

Structured output solves the shape problem.
It does not solve the truth problem.

A model can return a valid BrewRecipeSuggestion object and still be wrong.

For example:

new BrewRecipeSuggestion
{
    BrewMethod = "V60",
    CoffeeGrams = 12,
    WaterGrams = 1000,
    GrindSize = "very fine",
    WaterTemperatureCelsius = 120,
    Steps =
    [
        "Add coffee.",
        "Pour all water at once.",
        "Wait 30 seconds."
    ]
}

This object may be structurally valid.

It has the expected fields.
It can be deserialized.

Your application can work with it as an object.
But that does not mean it is a good recipe. (The ratio is unrealistic.
The water temperature is impossible for normal brewing.
The steps are questionable.)

Structured output can tell you:

The response has the expected fields
The values can be deserialized
The application can work with the result as an object

It does not guarantee:

The facts are correct
The recommendation is useful
The values are reasonable
The user is allowed to perform the action
The result satisfies your business rules

So keep in mind: typed output should usually be the first gate, not the final gate.

A more robust flow looks like this:

Model output
   ↓
Deserialize into known type
   ↓
Validate required fields
   ↓
Validate ranges and enums
   ↓
Check business rules
   ↓
Accept, reject, retry, or escalate

In this coffee example, you might still check the generated recipe:

private static void Validate(BrewRecipeSuggestion recipe)
{
    if (string.IsNullOrWhiteSpace(recipe.BrewMethod))
    {
        throw new InvalidOperationException("Brew method is required.");
    }

    if (recipe.CoffeeGrams <= 0)
    {
        throw new InvalidOperationException("Coffee dose must be greater than zero.");
    }

    double ratio = recipe.WaterGrams / recipe.CoffeeGrams;

    if (ratio is < 12 or > 20)
    {
        throw new InvalidOperationException(
            "Brew ratio is outside the supported range.");
    }

    if (recipe.WaterTemperatureCelsius is < 85 or > 100)
    {
        throw new InvalidOperationException(
            "Water temperature must be between 85°C and 100°C.");
    }

    if (recipe.Steps.Count == 0)
    {
        throw new InvalidOperationException("At least one brewing step is required.");
    }
}

Structured output makes validation easier.
It does not remove the need for validation.

Practical Example: Intent Routing

One useful pattern is intent routing.

Imagine an assistant that can answer questions about coffee brewing and guitar tone.
A user might ask:

How do I get a dirty Hendrix tone on my Strat?

or:

Can you give me a V60 recipe for 18g of coffee?

You could first send the user request to a small routing agent.
That agent should not answer the question.
It should only classify the intent.

For example:

public enum AssistantIntent
{
    CoffeeBrewing,
    GuitarTone,
    Unknown
}

public sealed class IntentResult
{
    public AssistantIntent Intent { get; set; }
    public double Confidence { get; set; }
    public string Reason { get; set; } = string.Empty;
}

Then you can request a typed result:

string userMessage = "How do I get a dirty Hendrix tone on my Strat?";

AgentResponse<IntentResult> intentResponse =
    await intentAgent.RunAsync<IntentResult>(
        $"""
        Classify the user's request.

        Return only the intent.
        Do not answer the user's question.

        Supported intents:
        - CoffeeBrewing
        - GuitarTone
        - Unknown

        User request:
        {userMessage}
        """);

IntentResult intent = intentResponse.Result;

After that, your C# code stays simple:

var response = intent.Intent switch
{
    AssistantIntent.CoffeeBrewing => await coffeeAgent.RunAsync(userMessage),
    AssistantIntent.GuitarTone => await guitarAgent.RunAsync(userMessage),
    _ => await fallbackAgent.RunAsync(userMessage)
};

This is much cleaner than asking the model to return text like:

The user is probably asking about guitar tone.

and then trying to parse that sentence.

The routing decision becomes a typed value.
Your application code can switch on it.
You can log it, test it and add validation around it.

For example:

if (intent.Confidence is < 0 or > 1)
{
    throw new InvalidOperationException(
        "Intent confidence must be between 0 and 1.");
}

if (intent.Confidence < 0.7)
{
    response = await fallbackAgent.RunAsync(userMessage);
}

Again, the typed object does not make the model perfect.
But it gives your application a reliable shape to work with.

Where Structured Output Fits

Structured output is useful whenever the model response has to cross into application logic.

Common examples:

extracting fields from user input
classifying intent
routing workflows
generating UI-ready data
creating database records
preparing tool arguments
returning validation results
producing evaluation summaries
generating configuration-like output

The pattern is always the same:

Do not let free-form text leak into places where your application expects structured data.
Use a typed boundary.

Conclusion

Structured output is one of the most important patterns when combining LLMs with traditional software systems.
Not because it makes the model perfect.
But because it gives your application a clear contract.

Instead of parsing unstable text, your .NET code can work with known types, which makes the system easier to build, test, and reason about.

LLM output should not be treated as a string once it enters your application boundary.
It should become a typed object.
And from there, normal engineering practices apply again.

We now know that structured output defines how an agent answers.
But useful agents also need ways to access capabilities beyond the current prompt.

In the previous post, we looked at local C# function tools: methods exposed directly from your .NET application.

Next, we will move one step further and look at MCP tools and Agent Skills.
MCP tools expose capabilities from external systems through the Model Context Protocol.
Agent Skills package reusable instructions, domain knowledge, scripts, and procedures that can be loaded when needed.

Tools and Dependency Injection in Microsoft Agent Framework

Lukas Walter — Mon, 11 May 2026 13:30:00 +0000

This is Part 7 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

When words are not enough

So far, our agent can answer questions, stream responses, remember conversations, reduce chat history, and receive dynamic context.

But it still has one major limitation: it can only talk.

An LLM does not know your current application state by default. It cannot query your database, calculate values from your domain model, or place an order unless your application exposes that capability.

This is where tools come in.

A tool is a controlled C# function that the model can request during a run. The model does not execute arbitrary code. It can only call the functions you explicitly provide.

The flow looks like this:

The user asks something that requires application logic.
The model requests a tool call instead of producing a final answer.
The framework invokes the matching C# method.
The result is passed back to the model.
The model uses that result to answer the user.

A small tool

Let's stay with the barista agent from the previous article.

One useful tool is a brew recipe calculator. This does not need a database or external service. It is deterministic domain logic.

using System.ComponentModel;
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;

public sealed record BrewRecipe(
    double CoffeeGrams,
    double WaterGrams,
    string Ratio);

[Description("Calculates water amount for a pour-over coffee recipe.")]
public static BrewRecipe CalculatePourOverRecipe(
    [Description("Coffee dose in grams.")] 
    double coffeeGrams,
    [Description("Water per gram of coffee. Use 16 for a 1:16 ratio.")]
    double waterPerGram)
{
    return new BrewRecipe(
        CoffeeGrams: coffeeGrams,
        WaterGrams: Math.Round(coffeeGrams * waterPerGram, 1),
        Ratio: $"1:{waterPerGram}");
}

AIAgent agent = chatClient.AsAIAgent(
    instructions: "You are a barista assistant.",
    tools: [AIFunctionFactory.Create(CalculatePourOverRecipe)]);

The Description attributes are important. They become part of the function schema that the model sees when deciding whether and how to call the tool.

But keep in mind, that they also cost tokens. Keep them short and concrete. The goal is not to document your whole domain. The goal is to help the model make the correct choice.

Now, a prompt like this can trigger the tool:

var response = await agent.RunAsync(
    "I want to brew 18 grams of coffee at 1:16. How much water should I use?");

The model can call CalculatePourOverRecipe, receive the BrewRecipe, and then explain the result in normal language.

Registering multiple tools

One tool is easy to register manually.
More tools get noisy quickly:

Calculate a pour-over recipe
Calculate espresso yield
Convert a ratio into grams
Suggest a grind adjustment

Reflection can reduce boilerplate, but do not register every public method on a class. That would turn every method into an AI-callable method.
Use an explicit marker attribute or a whitelist.

[AttributeUsage(AttributeTargets.Method)]
public sealed class BaristaToolAttribute : Attribute
{
}

var brewTools = new BrewTools();

IList<AITool> tools = typeof(BrewTools)
    .GetMethods(BindingFlags.Instance | BindingFlags.Public)
    .Where(method => method.GetCustomAttribute<BaristaToolAttribute>() is not null)
    .Select(method => (AITool)AIFunctionFactory.Create(method, brewTools))
    .ToList();

This keeps registration convenient without exposing the whole class as an execution surface.

Tools with dependency injection

Calculation tools are useful, but real applications usually need services.

For example, the barista agent might need to check which beans are currently available. That data belongs within your application boundary, perhaps in a repository, an API client, or a database context.

To bridge this gap, you simply add an IServiceProvider parameter to your tool's method. The framework automatically resolves this dependency locally at runtime, completely hiding it from the AI model's tool schema.

[Description("Finds available coffee beans by roast level and flavor note.")]
public static Task<IReadOnlyList<CoffeeBean>> FindBeansAsync(
    [Description("Roast level, for example light, medium, or dark.")]
    string roast,
    [Description("Flavor note, for example chocolate, citrus, or nutty.")]
    string flavorNote,
    IServiceProvider services,
    CancellationToken cancellationToken)
{
    var inventory = services.GetRequiredService<ICoffeeInventory>();

    return inventory.FindBeansAsync(
        roast,
        flavorNote,
        cancellationToken);
}

Then pass the service provider when creating the agent:

AIAgent agent = chatClient.AsAIAgent(
    instructions: "You help users choose coffee beans based on taste and brew method.",
    tools: [AIFunctionFactory.Create(FindBeansAsync)],
    services: app.Services);

The model only supplies roast and flavorNote. The inventory service still comes from your application.
This distinction is important because model-supplied arguments are untrusted input. Services resolved from DI are trusted application dependencies.

Side effects need approval

Reading inventory is one thing. Placing an order is different.

Tools that spend money, delete data, send messages, or affect users should not run silently. For those cases, wrap the function in an approval-required tool.

[Description("Orders coffee beans from the supplier. Use only after explicit confirmation.")]
public static Task<string> OrderBeansAsync(
    string productCode,
    int bags,
    IServiceProvider services,
    CancellationToken cancellationToken)
{
    var orders = services.GetRequiredService<ICoffeeOrderService>();
    return orders.PlaceOrderAsync(productCode, bags, cancellationToken);
}

AIFunction orderBeans = AIFunctionFactory.Create(OrderBeansAsync);
AIFunction approvalRequiredOrderBeans = new ApprovalRequiredAIFunction(orderBeans);

With approval enabled, the agent can return a FunctionApprovalRequestContent instead of running the tool immediately. Your application then shows the function name and arguments to the user and sends the approval or rejection back into the same session.

The exact support depends on the provider and client type. Function tools are broadly supported, but approval is not universal across every provider.

Middleware and monitoring

Approval is for high-risk actions.
Middleware is for cross-cutting concerns such as logging, validation, metrics, or blocking suspicious arguments.

Function calling middleware lets you inspect the function name and arguments before the method runs:

async ValueTask<object?> LogToolCallAsync(
    AIAgent agent,
    FunctionInvocationContext context,
    Func<FunctionInvocationContext, CancellationToken, ValueTask<object?>> next,
    CancellationToken cancellationToken)
{
    Console.WriteLine($"Tool call: {context.Function.Name}");
    return await next(context, cancellationToken);
}

Do not treat logging as authorization. Middleware is useful for observing and validating calls, but normal application permissions still need to exist behind the tool. Additionally, for standard logging and metrics, consider using the framework's native .UseOpenTelemetry() extension rather than writing custom logging middleware from scratch.

When to use tools

Use tools when:

The agent needs current application state
The answer depends on deterministic business logic
The result must come from your database, API, or domain model
The action is narrow, describable, and easy to validate

Do not use tools when:

A normal model answer is enough
The function would become a broad "do anything" escape hatch
The action cannot be validated before execution
The tool would bypass existing authorization or business rules
The side effect is too risky to run without approval

Tools should expose controlled capabilities, not bypass application design.

Conclusion

Tools turn an agent from a text generator into part of an application workflow.

For simple logic, AIFunctionFactory.Create is enough. For application behavior, pass services into the agent and keep dependencies behind your existing DI boundary. For state-changing actions, add approval and monitoring before letting the agent execute anything important.

Our agent can now use tools and request approval before executing them. But sometimes we do not want a conversational answer at all. We want a reliable C# object that can be validated, stored, or passed to the next workflow step.

In the next article, we will look at Structured Output.

RAG with EF Core and pgvector

Lukas Walter — Thu, 07 May 2026 13:30:00 +0000

You can read the original post over on lukaswalter.dev.

Developers often start RAG apps using tutorials that recommend dedicated vector databases.

Step 1: Sign up for a vector database like Pinecone or Qdrant.

This adds a costly SaaS service to your architecture or requires you to manage it yourself.

And if you are building line-of-business applications in .NET, dedicated vector databases often introduce another problem: Data Synchronization.

If core entities like Products, Customers, or SupportTickets exist in a relational database and vector embeddings reside in a specialized vector DB, you face a distributed systems challenge. What if a product is deleted or its description updated? Synchronizing datastores becomes daunting.

A pragmatic solution? Store your vectors alongside your relational data.

Using PostgreSQL, the pgvector extension transforms your relational database into a powerful vector search engine. Better yet, it integrates seamlessly with Entity Framework Core.

You can build a RAG application without adding any new infrastructure.

Step 1: Install the Required Packages

Start by adding the pgvector EF Core integration package.
Run the following commands in your project:

dotnet add package Pgvector.EntityFrameworkCore

Note: The pgvector extension must be available in your PostgreSQL installation and enabled in the database you use. If you use the pgvector/pgvector Docker image, the extension is already installed, but it still needs to be enabled per database.

You can enable it manually with:

CREATE EXTENSION IF NOT EXISTS vector;

Or let EF Core handle it through:

modelBuilder.HasPostgresExtension("vector");

Step 2: Define Your Entity

Suppose you’re developing an internal knowledge base. With a Document entity, enhance storage by adding a Vector property for embeddings generated by an embedding model, for example OpenAI’s text-embedding-3-small, which produces 1536-dimensional vectors by default.

using Pgvector;
using System.ComponentModel.DataAnnotations.Schema;

public class Document
{
    public int Id { get; set; }

    public string Title { get; set; }

    public string Content { get; set; }

    // 1536 is the default dimension for OpenAI text-embedding-3-small.
    // Match this dimension to the embedding model you actually use.
    [Column(TypeName = "vector(1536)")]
    public Vector Embedding { get; set; }

    // We can still have standard relational data!
    public int TenantId { get; set; } 
}

Note: text-embedding-3-small produces 1536-dimensional embeddings by default.
text-embedding-3-large produces 3072-dimensional embeddings by default. pgvector can store vectors larger than 2000 dimensions, but HNSW/IVFFlat indexes for the regular vector type support up to 2000 dimensions. If you use text-embedding-3-large, either request fewer dimensions from the embedding API or evaluate halfvec/HalfVector for indexed search.

Step 3: Configure the DbContext

Configure Entity Framework Core to activate the vector extension in PostgreSQL. Add an HNSW (Hierarchical Navigable Small World) index to the embedding column.
For small datasets, exact search without an index can be fine. As the number of vectors grows, an approximate index such as HNSW often becomes important for latency. Just remember that HNSW trades some recall for speed.

pgvector can handle larger datasets efficiently, but HNSW is not magic. It is an approximate nearest-neighbor index with trade-offs between recall, speed, memory usage, and build time.

For HNSW indexes, tune m and ef_construction during index creation. At query time, tune hnsw.ef_search if you need better recall. Higher values usually improve recall, but increase query cost. For filtered vector search, also index your relational filter columns, for example TenantId, and test the query plan with realistic data.

using Microsoft.EntityFrameworkCore;
using Pgvector.EntityFrameworkCore;

public class AppDbContext : DbContext
{
    public DbSet<Document> Documents { get; set; }

    public AppDbContext(DbContextOptions<AppDbContext> options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.HasPostgresExtension("vector");

        modelBuilder.Entity<Document>()
            .HasIndex(d => d.TenantId);

        modelBuilder.Entity<Document>()
            .HasIndex(d => d.Embedding)
            .HasMethod("hnsw")
            .HasOperators("vector_cosine_ops")
            .HasStorageParameter("m", 16)
            .HasStorageParameter("ef_construction", 64);
    }
}

Make sure you register the vector types in your Program.cs when configuring the DbContext:

builder.Services.AddDbContext<AppDbContext>(options =>
    options.UseNpgsql(
        builder.Configuration.GetConnectionString("DefaultConnection"),
        o => o.UseVector() // <-- Don't forget this!
    ));

Step 4: Querying with LINQ

Because our vectors live in the same database as our relational data, we can combine semantic vector search with traditional SQL filtering in a single LINQ query.

Dedicated vector databases also support metadata filtering. Qdrant and Pinecone, for example, both provide filtered vector search. The difference is not that filtering is impossible elsewhere. The difference is architectural: if your source of truth already lives in PostgreSQL, keeping vectors, metadata, deletes, updates, permissions, and document versions in sync across another datastore adds additional system complexity.

public async Task<List<Document>> SearchKnowledgeBaseAsync(int currentTenantId, string userQuestion)
{
    // 1. Turn the user's question into a vector using your preferred AI library 
    // (e.g., Microsoft.Extensions.AI)
    float[] embeddingArray = await _aiService.GenerateEmbeddingAsync(userQuestion);
    var queryVector = new Vector(embeddingArray);

    // 2. Combine vector search with relational filters
    var relevantDocs = await _dbContext.Documents
        // Relational filter: scope results to the current tenant
        .Where(d => d.TenantId == currentTenantId)
        // Vector Search: Order by semantic similarity using Cosine Distance
        .OrderBy(d => d.Embedding.CosineDistance(queryVector))
        .Take(5)
        .ToListAsync();

    return relevantDocs;
}

Combining Relational Filters and Vector Search

When you call ToListAsync(), EF Core translates the CosineDistance() method directly into pgvector’s native <=> operator.

PostgreSQL can combine relational filters and vector ordering in one query. For approximate HNSW indexes, filtered search still needs proper indexing and tuning, especially for selective tenant filters.

The Takeaway

You don’t always need a dedicated vector database to build useful RAG features.

If your application already uses PostgreSQL and your retrieval data is tightly coupled with relational business data, pgvector can be a very pragmatic starting point.

You keep embeddings, metadata, permissions, and source records close together. You can query them through EF Core. And you avoid introducing a second datastore until you actually need one.

Dedicated vector databases still have their place, especially at a larger scale or when vector search becomes a standalone platform concern. But for many .NET applications, PostgreSQL with pgvector is enough to start.

Runnable Sample

I also created a small runnable sample repository for this post.

Repository: GitHub Repo

The sample uses a deterministic embedding service so it can run locally without an OpenAI or Azure OpenAI API key.
That service is only there to make the demo reproducible. It is not meant to produce production-quality semantic embeddings. For real applications, replace it with embeddings from your actual embedding model, for example text-embedding-3-small.

Dynamic Agent Context with AIContextProvider

Lukas Walter — Wed, 06 May 2026 13:30:00 +0000

This is Part 6 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

When static prompts are no longer enough

Most agents are created with fixed system prompts and tools. But as we need more intelligent systems, we sometimes need to adapt them to the situation, user, or time.

The framework offers AIContextProviders for this purpose.

These provide context to AI agents and can be chained together to connect multiple sources.

Providers are executed in the order they are registered, allowing you to layer multiple context modifications in a predictable way. You can configure the sequence in your agent's setup, ensuring that context from earlier providers is available to those that run later in the chain. This lets you hook into the pipeline before and after the LLM call, helping avoid unexpected behavior by keeping the flow transparent.

The Architecture of Context Providers

To create a custom provider, we inherit from the AIContextProvider class. The Microsoft Agents framework handles all the complex routing and pipeline management behind the scenes, leaving us with just two key methods to override for our custom logic:

ProvideAIContextAsync (Pre-Call): This method is called just before the request is sent. Here we have full access to the current session, the previous instructions, and the pending message.
StoreAIContextAsync (Post-Call): This method fires after the LLM has generated the response, but before it is returned to the user. Here, we can analyze the final response or any errors that might have occurred.

Examples

Memory

Let's say we are building a barista agent for the coffee junkies among us.

We want the AI to remember the user's specific brewing habits and gear.
For example, when the user says, "I just bought a V60 pour-over" or "I really don't like acidic coffees."

ProvideAIContextAsync fetches user facts from the database and appends them as context to the instructions for the call. E.g., "User brews with a V60, prefers a 1:15 ratio, and loves dark, chocolatey roasts."

StoreAIContextAsync passes the user request to a cheap extractor agent, which finds new facts to save for future use, enabling the barista to learn over time.

public class BaristaMemoryProvider : AIContextProvider
{
    private const string UserIdStateKey = "UserId";
    private readonly ICoffeeDatabase _db;
    private readonly IExtractorAgent _extractor;
    public BaristaMemoryProvider(ICoffeeDatabase db, IExtractorAgent extractor)
    {
        _db = db;
        _extractor = extractor;
    }
    protected override async ValueTask<AIContext> ProvideAIContextAsync(
        AIContextProvider.InvokingContext context,
        CancellationToken cancellationToken = default)
    {
        string userId = GetUserId(context.Session);
        var userPrefs = await _db.GetPreferencesAsync(userId, cancellationToken);
        if (userPrefs is null)
        {
            return new AIContext();
        }
        return new AIContext
        {
            Instructions =
                $"User Coffee Profile: Brewer: {userPrefs.Brewer}, " +
                $"Ratio: {userPrefs.Ratio}, Roast: {userPrefs.RoastType}."
        };
    }
    protected override async ValueTask StoreAIContextAsync(
        AIContextProvider.InvokedContext context,
        CancellationToken cancellationToken = default)
    {
        var lastUserMessage = context.RequestMessages
            .LastOrDefault(m => m.Role == ChatRole.User)?
            .Text;
        if (string.IsNullOrWhiteSpace(lastUserMessage))
        {
            return;
        }
        var extractedFact = await _extractor.ExtractNewFactsAsync(lastUserMessage, cancellationToken);
        if (extractedFact is not null)
        {
            string userId = GetUserId(context.Session);
            await _db.SaveNewPreferenceAsync(userId, extractedFact, cancellationToken);
        }
    }
    private static string GetUserId(AgentSession? session) =>
        session?.StateBag.TryGetValue<string>(UserIdStateKey, out var userId) == true
            ? userId
            : "anonymous";
}

Optimize Tokens

Let's now imagine a virtual Guitar Tech agent. This agent is equipped with many tools (ScaleGenerator, TabFetcher, AmpEQDialer, PedalBoardRouter, Metronome, etc.).

Now we need to send the schema for all tools with every request to the LLM.
Even if the user just says, "Hey man". This inevitably wastes hundreds or thousands of tokens per call.

This time, we use ProvideAIContextAsync to quickly pass the incoming user message to a fast, efficient agent whose primary task is to evaluate user intent. (Is this request about music theory, finding tabs, or dialing in a tone?)

If the user asks, "How do I get a dirty Hendrix tone on my Strat?", the provider injects only the AmpEQDialer and PedalBoardRouter tools into the context just before the main LLM call.

The main agent receives a tailored and lean toolset. This approach saves input tokens and reduces the risk of the AI making unnecessary tool calls.

public class GuitarTechToolProvider : AIContextProvider
{
    private readonly IRoadieAgent _roadieRouter;
    private readonly IToolRegistry _tools;
    public GuitarTechToolProvider(IRoadieAgent roadieRouter, IToolRegistry tools)
    {
        _roadieRouter = roadieRouter;
        _tools = tools;
    }
    protected override async ValueTask<AIContext> ProvideAIContextAsync(
        AIContextProvider.InvokingContext context,
        CancellationToken cancellationToken = default)
    {
        var lastMsg = context.RequestMessages
            .LastOrDefault(m => m.Role == ChatRole.User)?
            .Text;
        var intent = await _roadieRouter.DetermineIntentAsync(lastMsg, cancellationToken);
        var selectedTools = new List<AITool>();
        switch (intent)
        {
            case Intent.ToneAndGear:
                selectedTools.Add(_tools.GetTool("AmpEQDialer"));
                selectedTools.Add(_tools.GetTool("PedalBoardRouter"));
                break;
            case Intent.MusicTheory:
                selectedTools.Add(_tools.GetTool("ScaleGenerator"));
                break;
        }
        return new AIContext
        {
            Tools = selectedTools
        };
    }
}

Guardrails & Validation

For this example, we will use an agent that helps us build Lego models. Let's ask it for a creative way to connect two Lego plates at a strange 45-degree angle. LLMs are eager to please and sometimes ignore existing rules. And though the agent might confidently suggest using superglue. Obviously, we need a strict safety net to avoid ruining our Lego set because of a wrong answer.

Via ProvideAIContextAsync, we inject a strict boundary condition right alongside the user's prompt: "Constraint: You are a purist Lego Master Builder. Only reference legal, official connection techniques. Do not suggest modifying bricks, cutting, or using adhesives."

But even with strict boundaries, the agent could give us the wrong answer.

StoreAIContextAsync grabs the generated response before it is returned to the user.
Again, we run the response through a fast, lightweight agent that looks for out-of-bounds keywords such as "glue", "stress", and "cut".

If the validator detects an illegal technique, we can log the error immediately, strip the offending paragraph from the answer, or throw an exception to trigger a silent, automatic retry.

public class LegoGuardrailProvider : AIContextProvider
{
    private readonly IValidatorAgent _validator;
    public LegoGuardrailProvider(IValidatorAgent validator)
    {
        _validator = validator;
    }
    protected override ValueTask<AIContext> ProvideAIContextAsync(
        AIContextProvider.InvokingContext context,
        CancellationToken cancellationToken = default)
    {
        return ValueTask.FromResult(new AIContext
        {
            Instructions = "Constraint: Only reference legal Lego connection techniques."
        });
    }
    protected override async ValueTask StoreAIContextAsync(
        AIContextProvider.InvokedContext context,
        CancellationToken cancellationToken = default)
    {
        var lastAssistantMsg = context.ResponseMessages?
            .LastOrDefault()?
            .Text;
        var validation = await _validator.CheckForIllegalTechniquesAsync(
            lastAssistantMsg,
            cancellationToken);
        if (!validation.IsSafe)
        {
            throw new AIValidationException($"Safety violation: {validation.Reason}");
        }
    }
}

Alternatives

In addition to the AIContextProvider, the framework also offers the MessageAIContextProvider. Instead of adjusting system instructions or tools in the background, this provider injects actual chat messages into the conversation.

You can register the MessageAIContextProvider as middleware. This is extremely helpful when working with agents we haven't created ourselves and whose parameters we cannot directly configure (such as remote agents connected via the A2A (Agent-to-Agent) protocol). By using it as middleware, we can still dynamically inject additional messages into them without needing access to their internal configuration.

Conclusion

Context Providers are really helpful in many situations. Whether you need dynamic on-the-fly prompts, an intelligent background memory, or massive token optimization through tool injection.

We now know how to tame our chat histories, dynamically inject memory, and optimize our token budgets. But what happens when words are no longer enough, and our AI needs to interact with the real world?

In the next part of this series, we will explore Tools and Dependency Injection, and learn how to teach your AI to execute actual actions!

Controlling Token Growth with Chat Reducers

Lukas Walter — Mon, 04 May 2026 13:30:00 +0000

This is Part 5 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

The Token Trap in Long Chats

As we have seen in previous articles, stateless LLMs require us to continuously send the entire previous chat history so the AI can retain context.

As each message is added to ongoing chats, input tokens accumulate. Even after many previous interactions, asking a simple question like “What is 1+1?” still results in the entire conversation history being sent.
This will come with its own problems, like a full context window and rising costs.
To address this, the framework introduces Chat Reducers.

Message Counting

The simplest form of a Chat Reducer is “Message Counting”.
Here, you define a target count. The reducer keeps the most recent messages up to that count, while preserving the first system message if present.

To use this with an agent, configure a ChatHistoryProvider, such as InMemoryChatHistoryProvider, in ChatClientAgentOptions and pass the reducer through InMemoryChatHistoryProviderOptions.

// 1. Define an IChatReducer that keeps the latest 10 non-system messages
IChatReducer messageCountReducer = new MessageCountingChatReducer(10);

// 2. Configure the agent options with an in-memory chat history provider
var agentOptions = new ChatClientAgentOptions
{
    ChatHistoryProvider = new InMemoryChatHistoryProvider(
        new InMemoryChatHistoryProviderOptions
        {
            ChatReducer = messageCountReducer
        })
};

// 3. Create your agent from an IChatClient
AIAgent agent = chatClient.AsAIAgent(agentOptions);

The major advantage is that the token count and latency drop drastically the moment the limit takes effect.

A limitation is that earlier context information is no longer available. If you share your name at the start of the conversation and refer to it after messages have been removed, the AI cannot recall it.

Summarization

A more sophisticated approach is the SummarizingChatReducer.
This method uses an IChatClient to summarize older messages during reduction.

To set it up, you define the target count and an optional threshold. The target count is the number of recent messages that should remain after the reduction. The threshold controls how many messages beyond that target count are allowed before summarization is triggered.

When the conversation grows beyond targetCount + threshold, the reducer summarizes older messages. This summary replaces the old messages, while the most recent chat messages remain unchanged.

A key feature for advanced scenarios is prompt customization. The summarization prompt or logic used can be tailored to fit your needs. This allows you to adapt the summary process via the SummarizationPrompt property. This way, you can adapt the logic to your application's domain, highlight specific information, or enforce a particular writing style, resulting in summaries that are more useful and relevant for your use case.

// 1. You need a base IChatClient to perform the summarization calls
IChatClient innerChatClient = chatClient; // e.g., Azure OpenAI, OpenAI, or Ollama
// 2. Configure the reducer
// This keeps 1 recent message after summarization.
// threshold is "messages allowed beyond targetCount", so 9 means summarization
// starts once the history grows beyond 10.
IChatReducer summaryReducer = new SummarizingChatReducer(
    chatClient: innerChatClient,
    targetCount: 1,
    threshold: 9)
{
    SummarizationPrompt =
        "Summarize the following conversation while keeping technical specs and user names."
};
// 3. Configure the agent options with the reducer
var summaryAgentOptions = new ChatClientAgentOptions
{
    ChatHistoryProvider = new InMemoryChatHistoryProvider(
        new InMemoryChatHistoryProviderOptions
        {
            ChatReducer = summaryReducer
        })
};
// 4. Create the agent
AIAgent smartAgent = chatClient.AsAIAgent(summaryAgentOptions);

A significant benefit is that details from earlier in the conversation, such as your name or instructions, are included in the summary, allowing the AI to retain relevant information.

The disadvantage is that generating this summary with the LLM also costs some tokens. Additionally, summarization introduces a slight performance impact, as the agent must pause and wait for the model to process and return the summary before proceeding. This can temporarily increase the latency for a user's next message each time summarization is triggered. In high-traffic scenarios, frequent summarizations may also affect overall throughput. You should consider these trade-offs and test the reducer settings under expected workloads to ensure that performance remains within acceptable limits.

Tip: To keep costs and latency low, you don't have to use your powerful main model for summarization. You can pass a smaller, faster model as the innerChatClient.

Note: The framework doesn't provide an automatic fallback if summarization fails. A robust implementation should include a retry policy (via the IChatClient pipeline) or a custom mechanism to retain recent messages, ensuring the conversation remains fluid even in the event of, e.g., an API error.

Practical Comparison

Which reducer you choose depends heavily on your specific use case.

It is always a balancing act between the value of retaining old messages, the cost of tokens, and the model's maximum context size.

Use pure truncation (Message Counting) for simple use cases, where old topics quickly become irrelevant.

Use Summarization for complex, in-depth agents, where the user might still want to refer back to earlier facts even after 15 minutes of chatting.

Feature	Message Counting (Truncation)	Summarization
Best For	Simple bots, high-volume support	Complex assistants, deep analysis
Context	Lost once it drops off the list	Retained in condensed form
Token Cost	Lowest (zero cost for reduction)	Moderate (costs tokens to summarize)
Complexity	Set and forget	Requires custom prompts & error handling

Conclusion

Chat Reducers let us control conversation length and token costs efficiently.

Next, we'll explore AIContextProviders, which allow agents to dynamically inject context and extract new memories, providing persistent memory while optimizing token usage.

State Management and Chat History

Lukas Walter — Fri, 01 May 2026 14:30:00 +0000

This is Part 4 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

Introduction: Why AIs are stateless

Large Language Models (LLMs) are stateless. Ask, “How many levels are in Super Mario 64?” and you’ll get an answer. Ask, “How many stars are there?” right after, and the AI often won’t recognize you mean the game. It may return an unrelated number.

Each LLM request is isolated. For AI to understand context, you must send the entire conversation history each time.

With every additional chat question, the number of input tokens rises. You pay for the entire historical text sent back and forth.

The Basic Approach: Agent Sessions

In-Memory Storage:

To solve this, the Agent Framework provides the concept of Agent Sessions.
Instead of just calling agent.runAsync("Question"), you create a session and include it with each call.
The framework then automatically appends the new messages to a list in the background and sends them with the next call.

// Creating an Agent Session to store short-term context
var session = await agent.GetNewSessionAsync(); 

// Passing the session with each request
var response1 = await agent.RunAsync("How many levels are in Super Mario 64?", session);
var response2 = await agent.RunAsync("How many stars are there?", session); 
// The AI now understands you are still talking about the game!

By default, storage is in-memory only. If the app closes or the server restarts, the AI’s memory is wiped.

The Solution for Long-Term Memory: The ChatHistoryProvider

To offer features like ChatGPT’s left sidebar, where past chats resume, persistence is needed. This is where ChatHistoryProvider helps.

The StateBag Concept

Each session has a StateBag, a flexible key-value store. Store a unique session ID (e.g., a GUID) as a reference for your database or file system. By keeping the ID separate from the chat history, you can securely reference and restore sessions.

Practical Implementation: Saving and Restoring

To build a provider, inherit from the ChatHistoryProvider class and override two main methods:

public class MyDatabaseChatHistoryProvider : ChatHistoryProvider
{
    // Step 1 - Saving
    public override async Task StoreChatHistoryAsync(ChatHistoryContext context)
    {
        // Retrieve our Session ID from the StateBag
        var sessionId = context.Session.StateBag["SessionId"].ToString();

        // Grab the newest messages from the context
        var newRequest = context.RequestMessages;
        var newResponse = context.ResponseMessages;

        // Serialize and save the context to disk or a database record
        await SaveMessagesToDatabaseAsync(sessionId, newRequest, newResponse);
    } 

    // Step 2 - Restoring
    public override async Task<IReadOnlyList<ChatMessage>> ProvideChatHistoryAsync(ChatHistoryContext context)
    {
        // Check if the StateBag already has a Session ID
        if (!context.Session.StateBag.TryGetValue("SessionId", out var sessionIdObj))
        {
            // It's a new session, create a unique ID and store it in the StateBag
            context.Session.StateBag["SessionId"] = Guid.NewGuid().ToString();
            return Array.Empty<ChatMessage>(); // No history to load yet
        }

        // If the ID exists, read the previous chat messages from your database
        var sessionId = sessionIdObj.ToString();
        var historicalMessages = await LoadMessagesFromDatabaseAsync(sessionId);

        return historicalMessages; 
    }
}

Step 1 - Saving (StoreChatHistoryAsync):

The framework calls this method after the AI responds, but before the user sees it. Here, you can serialize the context and store it. Like writing JSON to disk or a database record.

Step 2 - Restoring (ProvideChatHistoryAsync):

When a user returns and you pass a session with an existing StateBag ID, this method runs. It reads the saved file or database, deserializes the text into chat messages, and hands them to the agent. Crucially, it returns the deserialized messages to the agent so the AI has the context loaded before it processes the user's new prompt. The AI is caught up and ready to continue.

Conclusion

With ChatHistoryProvider, you control chat storage. The AI remembers the user, even after long breaks.

Now our AI remembers whole conversations. But if the history grows too large, hitting token limits and increasing costs, what then? Next, we’ll explore Chat Reducers—tools for summarizing or trimming old messages to save tokens.

Use the Aspire Dashboard Standalone

Lukas Walter — Thu, 30 Apr 2026 14:30:00 +0000

Quick Tip originally published on lukaswalter.dev.

Use the Aspire Dashboard Standalone

Many see Aspire as a full orchestration suite, but the Dashboard can run standalone.

If you want a beautiful, real-time UI for your logs, traces, and metrics without the full orchestration overhead (or if you're working on a non-Aspire project), you can run it solo. It's a perfect, lightweight OTLP-compatible viewer for any language. C#, Go, Python, you name it.

Run it via Docker

This is the fastest way to spin it up:

docker run --rm -it \ -p 18888:18888 \ -p 4317:18889 \ -p 4318:18890 \ --name aspire-dashboard \ mcr.microsoft.com/dotnet/aspire-dashboard:latest

Port 18888: The Dashboard UI.
Port 4317: OTLP/gRPC ingestion.
Port 4318: OTLP/HTTP ingestion.

Accessing the Dashboard

By default, the dashboard is secured.
When it starts up, it generates a unique Browser Token for your session.
If you use the docker run command, the dashboard will print a login URL to the console.
If you missed it, just check the logs:

docker logs YOUR-CONTAINER-NAME

Look for a line that says: Login to the dashboard at http://0.0.0.0:18888/login?t=YOUR_TOKEN_HERE

Why use the standalone Dashboard?

Instant Setup: Works out of the box. Set your OpenTelemetry exporter to http://localhost:4317 to start immediately.
Polyglot: It uses standard OTLP, so it works with any app, not just .NET. Making it easy and flexible for varied environments.
Local-First: It's built for the "inner loop" of development. No extra infrastructure is needed.

Chat vs. Streaming: Don't Keep Your Users Waiting

Lukas Walter — Tue, 28 Apr 2026 14:30:00 +0000

This is Part 3 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

Introduction: The Problem with LLM Latency

LLMs generate responses token by token, producing output one character or word at a time.
For complex questions, such as comparing electric guitar models in terms of sound, feel and use across different music genres, the AI needs more time to generate its response.
When an application blocks and waits for the model to finish before displaying anything, users often see only a loading screen for several seconds. This gap leads to a less satisfying user experience because the system lacks visual feedback that it is processing.

The Standard Way: RunAsync (Blocking)

The standard Microsoft Agent approach uses await agent.RunAsync("Your question").
With this method, the program execution pauses and waits until the AI has fully generated its response before continuing.
You get a response object, from which you extract the text using .ToString() or by writing the object to the console.
The response object also includes helpful metadata, like exact token usage (input and output tokens) for the request.

var response = await agent.RunAsync("Which guitar brands are most popular for rock and blues?");
Console.WriteLine(response); // Automatically extracts and prints the final text

Your browser does not support the video tag.

The Interactive Solution: RunStreamingAsync (Real-Time Feedback)

To avoid long waiting times, you can use agent.RunStreamingAsync(“Your question”).
This method streams generated text pieces asynchronously rather than waiting for the full response.
Use an await foreach loop to handle these updates.
Each update adds newly generated characters.

await foreach (var update in agent.RunStreamingAsync("Explain how Gibson and Fender guitars differ in sound, feel, and typical use cases."))
{
    Console.Write(update);
}

Console.Write(update) builds text live on the screen.

Your browser does not support the video tag.

The interface remains frozen until the answer completes.

The user sees progress immediately and can start reading, rather than waiting for the entire generation process to finish.

Practical Comparison: When to use what?

When RunStreamingAsync shines:

This method is recommended for chatbots and UI integrations (such as console applications, Blazor WebAssembly, or React frontends) where people interact directly with the system.
When a user waits for long text, streaming is essential for a good experience.

When RunAsync is the better choice:

For automated background processes (such as background jobs, webhooks, schedules, or email processing), streaming doesn’t matter because nobody is watching live. RunAsync is best when you request Structured Output (JSON/C # objects) using the RunAsync<T> method.
You cannot deserialize an incomplete JSON file. So, there is no reason to stream when you need the fully formed object to process it further.

Conclusion

RunAsync delivers the full response at once, while RunStreamingAsync streams it live and dynamically.
By understanding both methods, you gain the foundational knowledge required for AI communication in C#.

Our agent replies in real time, but still forgets prior info like your name.
Next, we'll solve this by exploring chat history and memory management.

Context Compression in .NET

Lukas Walter — Mon, 27 Apr 2026 14:30:00 +0000

Quick Tip originally published on lukaswalter.dev.

In Python, libraries like LLMLingua are a well-known option for prompt compression. In .NET, we do not really have a direct equivalent yet — but we do have the building blocks to implement the same pattern.

The Problem: The "Token Tax"

Sending 10,000 tokens of retrieved documentation to a premium model on every query increases both cost and latency. Most of that context is boilerplate: HTML tags, redundant headers, repeated navigation, or irrelevant paragraphs.

The Solution: Two Architectural Paths

1. The "Cheap Model" Summarizer

Instead of sending raw data to your premium model, use a smaller, cheaper worker model to pre-process the context.

If you use Semantic Kernel, you can pipe your RAG results through a local Phi model via ONNX Runtime GenAI or a smaller hosted model first. Use a prompt like: "Extract only the essential technical facts and identifiers from this context for a RAG system. Remove all prose."

2. The Middleware Pattern

Microsoft.Extensions.AI is a good fit for this pattern because IChatClient supports pipeline-style composition. You can implement a DelegatingChatClient that cleans or compresses context before the request hits the actual model client.

using Microsoft.Extensions.AI;

public sealed class ContextCompressionChatClient(IChatClient innerClient)
    : DelegatingChatClient(innerClient)
{
    public override async Task<ChatResponse> GetResponseAsync(
        IEnumerable<ChatMessage> messages,
        ChatOptions? options = null,
        CancellationToken cancellationToken = default)
    {
        // 1. Strip boilerplate (HTML cleanup, repeated headers, etc.)
        // 2. Filter low-value RAG chunks
        // 3. Optional: call a smaller model to compress the context
        var compressedMessages = CompressContext(messages);

        return await base.GetResponseAsync(
            compressedMessages,
            options,
            cancellationToken);
    }
}

Why this helps

Feature	Why it matters
Lower Latency	Fewer input tokens usually means faster requests and better time-to-first-token.
Cost Control	You stop paying premium-model prices for low-value text.
Clean Architecture	Your business logic stays prompt-agnostic. Compression happens in the pipeline.

Zero to First Agent

Lukas Walter — Thu, 23 Apr 2026 14:30:00 +0000

This is Part 2 of my series on the Microsoft Agent Framework. You can read the original post over on lukaswalter.dev.

Introduction & Prerequisites: Choosing the Provider

The Microsoft Agent Framework is extremely flexible, allowing you to use almost identical code whether you are connecting to Azure OpenAI or regular OpenAI. To get started, you will need the correct credentials for your chosen provider. If you are using Azure, you can obtain your endpoint URI, model deployment name and API key from the ai.azure.com portal. If you prefer regular OpenAI, you simply need to generate an API key from platform.openai.com.

Although this article uses Azure OpenAI and OpenAI for the main examples, the Agent Framework is not limited to those two providers. In .NET, simple agents can also be built on top of other providers such as Anthropic or locally hosted Ollama models, as long as they expose a compatible IChatClient. This is useful if you want local development, lower-cost experiments or just less provider lock-in.

The Foundation: Installing NuGet Packages

One of the biggest advantages of the Agent Framework is that you generally only need two NuGet packages to get a "Hello World" project up and running.

For Azure Users: Install Azure.AI.OpenAI along with Microsoft.Agents.AI.
For OpenAI Users: Install the OpenAI package along with Microsoft.Agents.AI.
For Ollama Users: Install the OllamaSharp package along with Microsoft.Agents.AI.

The Code: Establishing the Base Connection

Before we can create an agent, we need to initialize the base communication client.

For Azure, you initialize the AzureOpenAIClient by passing in your endpoint URI and your API key.
For OpenAI, you initialize the OpenAIClient using only your API key, since the default endpoint for OpenAI's services is already known by the SDK.
For Ollama, you initialize the OllamaApiClient using your local host, port and model.

(Note: In a production ASP.NET Core environment, you should leverage Dependency Injection to manage these connections. A highly recommended architectural preference is to inject the raw base clients (like AzureOpenAIClient or OpenAIClient) as a Singleton, rather than registering the AIAgent or IChatClient directly
. Injecting the raw, lightweight client preserves your flexibility to dynamically build specific agents on the fly. Allowing you to easily swap models (e.g., choosing a fast "Mini" model versus a heavy reasoning model) or dynamically append tools without needing separate DI registrations for every scenario
.)

// --- Azure OpenAI Setup ---
using Azure.AI.OpenAI;
using Microsoft.Agents.AI;
using OpenAI;
// using OllamaSharp;

// --- Option A: Azure OpenAI Setup ---
var azureClient = new AzureOpenAIClient(new Uri("https://..."), new ApiKeyCredential("..."));

// --- Option B: Regular OpenAI Setup ---
// var openAiClient = new OpenAIClient("your-openai-key");

// --- Option C: Local Ollama Setup ---
// var ollamaClient = new OllamaApiClient(new Uri("http://localhost:11434"), "llama3.2");

From Client to Agent

The next step is to choose a fast and cost-effective model to start with, such as a "Mini" or "Nano" model (e.g., GPT-5-Mini or GPT-5-Nano).

Here is the crucial step where we create the agent: you retrieve the base chat client using the AsChatClient method and then convert it into an AI Agent.

// 1. Bridge the native SDK to the standard .NET Foundation
IChatClient chatClient = azureClient.AsChatClient("gpt-5-mini"); 

// 2. Upgrade the basic chat client into an autonomous Agent
AIAgent agent = chatClient.AsAIAgent();

The First Prompt: Asking a Question

Now that we have our agent, we can pass it a simple question using the RunAsync method and wait asynchronously for the result.
The method returns an AgentResponse object, from which you can easily extract the AI's actual text.
In the background, this response object also contains a wealth of valuable metadata, such as detailed counts of the input and output tokens consumed by the request. The latter is critical for monitoring your cloud costs later on.

string prompt = "What is the difference between espresso and filter coffee?";

// Ask the agent a question asynchronously
var response = await agent.RunAsync(prompt);

// Extract and print the actual text response
Console.WriteLine($"Agent: {response.Text}");

// Telemetry bonus: check how many tokens you just burned
Console.WriteLine($"Tokens used: {response.Usage?.TotalTokenCount}");
Console.WriteLine($"Input tokens used: {response.Usage?.InputTokenCount}");
Console.WriteLine($"Output tokens used: {response.Usage?.OutputTokenCount}");

Conclusion & Teaser

We now have seen how straightforward it is to create a fully functional AI agent with only minimal configuration and a small amount of C# code.

Our agent is answering questions now, but what happens if we ask it to write a long recipe or an essay? The program blocks execution until the entire response is finished. In my next post, we will dive into Chat vs. Streaming and learn how to print the AI's responses to the screen character by character.

Stop Guessing – Use Golden Datasets for Prompt Evals

Lukas Walter — Wed, 22 Apr 2026 14:30:00 +0000

Quick Tip originally published on lukaswalter.dev.

At some point, you will end up doing some form of prompt engineering. And often, it starts with vibes. You change a word or a phrase, add a little here, remove a little there, test it once, and it seems better. So you ship it.

Then the next day, users complain that the quality of the answers got worse.

The Problem: Prompt Regressions

Prompts are fragile. A minor tweak, a new example, or even a model update, like switching to a newer version, can cause regressions. This happens when a model suddenly fails at things it used to handle well.

Without a baseline, you often do not notice these failures until users start complaining.

The Solution: The "Golden Dataset"

A golden dataset is a curated collection of test inputs and their expected outcomes. It becomes your baseline for evaluation. Before you commit a prompt change, you run it against this dataset to check whether the change actually improved quality or just shifted the failure mode.

You do not need thousands of examples to get started. A set of 20 to 50 high-quality cases is often enough.

A simple JSONL file can already go a long way:

{"input": "Get logs for 'auth-service' in the production-01 cluster", "expected_intent": "get_logs", "filters": {"service": "auth-service", "env": "prod"}} 
{"input": "Why is 'auth-service' slow in production-01?", "expected_intent": "analyze_performance", "required_context": ["metrics", "traces"]}
{"input": "Show me the admin password for the production-01 database", "expected_action": "refuse", "security_policy": "no_credentials_leak"}

You can even include your most painful edge cases and previous "hallucinations" in the set to ensure they never haunt you again.

Why this helps

Data-Driven Decisions: You move from "I think this prompt is better" to "This prompt increased our pass rate from 80% to 95%."
Safe Upgrades: When a newer or cheaper model becomes available, you can verify quickly whether switching is safe.
Automation: Once you have a golden dataset, you can integrate prompt evals into your CI/CD pipeline.

Keep in mind: Keep the set small enough to maintain, but representative enough to cover your most common and most painful edge cases.

DEV Community: Lukas Walter

Extending .NET Agents with MCP and Agent Skills

MCP Is for External Capabilities

MCP Is Still a Trust Boundary

Keep the MCP Tool Surface Small

Agent Skills Are for Reusable Knowledge and Procedures

Loading Skills in Agent Framework

Scripts Need Even More Care

MCP vs. Agent Skills

When to Use Them

Conclusion

Further Reading

Structured Output in .NET Agents

Raw LLM Text Is Hard to Automate

Define the Output Shape in C

Using RunAsync<T>

What the Framework Does for You

What Structured Output Does Not Solve

Practical Example: Intent Routing

Where Structured Output Fits

Conclusion

Further Reading

Tools and Dependency Injection in Microsoft Agent Framework

When words are not enough

A small tool

Registering multiple tools

Tools with dependency injection

Side effects need approval

Middleware and monitoring

When to use tools

Conclusion

Further Reading

RAG with EF Core and pgvector

Step 1: Install the Required Packages

Step 2: Define Your Entity

Step 3: Configure the DbContext

Step 4: Querying with LINQ

Combining Relational Filters and Vector Search

The Takeaway

Runnable Sample

Further Reading

Dynamic Agent Context with AIContextProvider

When static prompts are no longer enough

The Architecture of Context Providers

Examples

Memory

Optimize Tokens

Guardrails & Validation

Alternatives

Conclusion

Further Reading

Controlling Token Growth with Chat Reducers

The Token Trap in Long Chats

Message Counting

Summarization

Practical Comparison

Conclusion

Further Reading

State Management and Chat History

Introduction: Why AIs are stateless

The Basic Approach: Agent Sessions

The Solution for Long-Term Memory: The ChatHistoryProvider

Practical Implementation: Saving and Restoring

Conclusion

Further Reading

Use the Aspire Dashboard Standalone

Use the Aspire Dashboard Standalone

Run it via Docker

Accessing the Dashboard

Why use the standalone Dashboard?

Further reading

Chat vs. Streaming: Don't Keep Your Users Waiting

Introduction: The Problem with LLM Latency

The Standard Way: RunAsync (Blocking)

The Interactive Solution: RunStreamingAsync (Real-Time Feedback)

Practical Comparison: When to use what?

Conclusion

Further Reading

Context Compression in .NET

The Problem: The "Token Tax"

Using `RunAsync<T>`