Microsoft Agent Framework with Foundry Local in .NET/C#

#webdev #agents #ai #softwareengineering

Introduction

Last month, I had the honour of speaking at .NET Conf 2025 Vietnam, where I presented on “Multi-Agent Workflows in .NET with Microsoft Agent Framework, A2A, MCP, and AG-UI.”
For those who couldn’t attend the session live, you can watch it here:

At the same event, I also had the chance to watch an excellent talk by anh Phi Huynh on "Self-Sovereign AI with Azure AI Foundry Local." The recording is available here:

In his session, anh Phi introduced Foundry Local and several LLMs that can run locally, such as Phi, Qwen, and ChatGPT-OSS. What immediately sparked my curiosity was this question:
Can we combine Foundry Local with the Microsoft Agent Framework (AgentFx) in .NET/C# to build fully local, self-sovereign agentic AI systems?

If the answer is yes, we could run both data and LLMs entirely on local machines, which is especially compelling for sensitive domains such as banking or insurance, where data must never be exposed to public LLM services.

Motivated by this idea, I started experimenting — and quickly discovered a number of interesting challenges and insights along the way. This post captures those findings, based on my hands-on experience trying to make Microsoft AgentFx work seamlessly with Foundry Local. It also highlights the gaps that exist today and where the Microsoft AgentFx team may eventually enable native support within the framework itself.

Explorer Foundry Local

At the beginning, I installed Foundry Local on my machine by following the docs at https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started?view=foundry-classic.

I used winget, then I typed:

winget install Microsoft.FoundryLocal

After that, I typed:

foundry service start
🟢 Service is Started on http://127.0.0.1:57591/, PID 35620!

And,

foundry service status
🟢 Model management service is running on http://127.0.0.1:57591/openai/status
EP autoregistration status: Successfully downloaded and registered the following EPs: NvTensorRTRTXExecutionProvider, CUDAExecutionProvider.
Valid EPs: CPUExecutionProvider, WebGpuExecutionProvider, NvTensorRTRTXExecutionProvider, CUDAExecutionProvider

Make sure you got to http://127.0.0.1:57591/openai/status, and see below.

Make Microsoft AgentFx work with Foundry Local

The next logical step was to make this work with Microsoft AgentFx, right? After spending about 30 minutes or more reviewing the code and documentation in the Microsoft AgentFx repository (https://github.com/microsoft/agent-framework),
I discovered that native support for Foundry Local currently exists only in the Python implementation (see https://github.com/microsoft/agent-framework/pull/2915). Unfortunately for me, what I was really looking for was a .NET/C# equivalent.

So, I decided to prototype my own solution in .NET/C#, taking the Python implementation as a reference. The results so far have been very promising. The setup works smoothly using the OpenAIClient SDK, and the LLM model I’m running locally is qwen2.5-14b-instruct-generic-cpu:4.

The code looks like:

var openAiClient = new OpenAIClient(
    new ApiKeyCredential("not-needed"),
    new OpenAIClientOptions { Endpoint = new Uri("http://127.0.0.1:57591/v1") });

var chatClient = openAiClient.GetChatClient("qwen2.5-14b-instruct-generic-cpu:4");

var openAiMessages = new List<OpenAI.Chat.ChatMessage>
{
    new SystemChatMessage("You are a helpful assistant.")
};

openAiMessages.Add(new UserChatMessage("Write the poem about .NET Conf 2025 in Vietnam"))

var response = await chatClient.CompleteChatAsync(openAiMessages, cancellationToken: cancellationToken);

Now I think we can use a function call with the OpenAI client with the standard function-calling format. After trying with the prototype, it didn't work. After struggling a bit with a ton of knowledge in Foundry Local, I found out that Foundry Local used a different format than what OpenAI does. See

Foundry Local uses a non-standard function calling format. Instead of returning function calls in the standard OpenAI tool_calls field, Qwen models return the function call as JSON text in the response content.

For example, when you ask about the weather, instead of:

# Standard OpenAI format message.tool_calls = [ {"name": "get_weather", "arguments": {"location": "Birmingham"}} ]

You get:

# Foundry Local format message.content = '{"name": "get_weather", "arguments": {"location": "Birmingham"}}'

See the article from Microsoft's Foundry Local team: https://techcommunity.microsoft.com/blog/educatordeveloperblog/function-calling-with-small-language-models/4472720

Now, we need to do a hack by writing the code to parse the tool_calling, which returns at the first LLM Model calling, then call the tools, gather all results and send again to the LLM model for generation. The full workflow is just exactly the same with OpenAI docs:

Look what I did below.

var openAiClient = new OpenAIClient(
    new ApiKeyCredential("not-needed"),
    new OpenAIClientOptions { Endpoint = new Uri(_foundryEndpoint) });

var chatClient = openAiClient.GetChatClient(_model);

// Build system prompt with tool definitions
var toolDefs = BuildToolDefs();
var jsonExample = @"{""name"": ""tool_name"", ""arguments"": {""param"": ""value""}}";

var systemPrompt = _mcpTools.Count > 0
    ? $"""
        {_instructions ?? "You are a helpful assistant with access to tools."}

        When you need to use a tool, respond ONLY with a JSON object in this exact format:
        {jsonExample}

        Available tools:
        {toolDefs}

        If you don't need a tool, respond normally with text.
        """
    : _instructions ?? "You are a helpful assistant.";

// Convert messages to OpenAI format
var openAiMessages = new List<OpenAI.Chat.ChatMessage>
{
    new SystemChatMessage(systemPrompt)
};

foreach (var msg in chatMessages)
{
    if (msg.Role == ChatRole.User)
        openAiMessages.Add(new UserChatMessage(msg.Text ?? ""));
    else if (msg.Role == ChatRole.Assistant)
        openAiMessages.Add(new AssistantChatMessage(msg.Text ?? ""));
}

_logger?.LogDebug("[FoundryLocalAgent] Sending request to model...");

var response = await chatClient.CompleteChatAsync(openAiMessages, cancellationToken: cancellationToken);
var content = response.Value.Content.FirstOrDefault()?.Text ?? "";

_logger?.LogDebug("[FoundryLocalAgent] Model response: {Content}", content);

// Parse tool calls from content (Foundry Local format)
var toolCalls = ToolCallParser.Parse(content);
var validToolCalls = toolCalls
    .Where(tc => _mcpTools.Any(t => t.Name == tc.Name))
    .ToList();

if (validToolCalls.Count == 0)
{
    // No tool calls, return the response as-is
    return content;
}

_logger?.LogInformation("[FoundryLocalAgent] Executing {Count} tool(s): {Tools}",
    validToolCalls.Count,
    string.Join(", ", validToolCalls.Select(tc => tc.Name)));

// Execute tools in parallel
var toolTasks = validToolCalls.Select(async toolCall =>
{
    using var toolActivity = GenAITracing.StartToolSpan(
        toolName: toolCall.Name,
        toolCallId: Guid.NewGuid().ToString("N")[..12],
        arguments: toolCall.Arguments);

    try
    {
        var result = await CallMcpToolAsync(toolCall.Name, toolCall.Arguments, cancellationToken);
        return new { toolCall.Name, Result = result ?? "", Error = (string?)null };
    }
    catch (Exception ex)
    {
        return new { toolCall.Name, Result = "", Error = (string?)ex.Message };
    }
});

var toolResults = await Task.WhenAll(toolTasks);

// Build tool results message
var toolResultsText = string.Join("\n\n", toolResults.Select(r =>
    r.Error is null
        ? $"Tool '{r.Name}' result:\n{r.Result}"
        : $"Tool '{r.Name}' error: {r.Error}"));

// Add tool results to the conversation and get the final response
openAiMessages.Add(new AssistantChatMessage(content));
openAiMessages.Add(new UserChatMessage($"Tool execution results:\n{toolResultsText}\n\nPlease provide your final response based on these results."));

var finalResponse = await chatClient.CompleteChatAsync(openAiMessages, cancellationToken: cancellationToken);
var finalResult = finalResponse.Value.Content.FirstOrDefault()?.Text ?? toolResultsText;

The full source code version (Microsoft AgentFx + Foundry Local) can be found at FoundryLocalAgent.cs

Some results

I set up 2 MCP tools:

DateTimeTools - GetDateTime tool (https://github.com/thangchung/agent-engineering-experiment/blob/main/foundry-local-agent-fx/McpToolServer/Tools/DateTimeTools.cs)
WeatherTools - GetWeather and GetTypicalWeather tools (https://github.com/thangchung/agent-engineering-experiment/blob/main/foundry-local-agent-fx/McpToolServer/Tools/WeatherTools.cs)

When I do a query on the chat API like "London time, and weather in New York after 2PM?", then it responds

{
  "response": "According to the London time (Europe/London), it is currently Sunday, December 28, 2025, 2:53:55 PM. Since you asked about the weather",
  "threadId": "ca0e12c91b02478482d71356bc8069e9"
}

See the Scalar UI as below.

And with Aspire, we can see even more details of how the workflow is running:

Local LLM models and the disadvantage of my implementation

At the time of writing this blog, I have tested with:
- qwen2.5-14b-instruct-generic-cpu:4
- Phi-4-mini-instruct-cuda-gpu:5
- Unluckily, I couldn't run Llama-3.2-1B-Instruct (https://learn.microsoft.com/en-gb/azure/ai-foundry/foundry-local/how-to/how-to-compile-hugging-face-models?view=foundry-classic&tabs=Bash) due to some failed setup when working with HuggingFace Model (https://github.com/thangchung/agent-engineering-experiment/issues/1) on my laptop. I will have a look at it later and give you an update.
And the current implementation with FoundryLocalAgent is only working correctly with RunAsync. RunStreamingAsync is still not working very effectively (call to RunAsync instead of implementing the async stream version). I have posted the comment on https://github.com/microsoft/agent-framework/issues/2963, and hope Microsoft AgentFx and Foundry Local team will support it natively.

Conclusion

I’ve walked you through my journey of getting Foundry Local (with Qwen and Phi models) working with Microsoft AgentFx — highlighting what works today, what doesn’t yet, and the workaround I used to bridge the gap. While this solution is admittedly a hack rather than a perfect implementation, it’s fully functional — and honestly, it feels pretty great to see it running end to end.

Looking ahead, I’m optimistic that the Microsoft AgentFx and Foundry Local teams will collaborate to deliver native Foundry Local support in AgentFx as part of an upcoming GA release (.NET/C# version), which I believe is coming very soon.

I hope you find this post useful.
The full source code is available here:
https://github.com/thangchung/agent-engineering-experiment/tree/main/foundry-local-agent-fx