Mehran Davoudi

Posted on Jul 5

Testing AI Agents in .NET with skUnit: Step-by-Step

#dotnet #ai #testing #csharp

One of the biggest challenges when building AI agents isn't writing the agent, it's testing it.

Traditional unit tests work great for deterministic code.

Assert.Equal(4, calculator.Add(2,2));

AI agents are different.

Even when an agent behaves correctly, two responses can be completely different.:

"No food today."

"We don't have any food today!"

Both are correct.

A traditional assertion would fail one of them.

That's exactly why I built skUnit, a testing framework for .NET AI applications that lets you verify behavior instead of exact text.

In this article we'll build and test a small AI agent called Moody Chef that suggests foods based on the user's mood!

Source Code

Everything shown in this article is available on GitHub.

https://github.com/mehrandvd/skunit/tree/main/demos/Demo.MoodyChef

Clone it if you'd like to follow along:

git clone https://github.com/mehrandvd/skunit.git
cd skunit/demos/Demo.MoodyChef

The Problem

Imagine an AI chef.

It recommends food depending on the customer's mood.

Mood	Menu
Happy	Pizza, Pasta, Salad
Sad	Ice Cream, Chocolate
Angry	Nothing

Now imagine a user writes:

Fuck you bastard! What food do you have?

The agent should recognize the user is angry and avoid suggesting food.

How do we verify that?

Not like this:

Assert.Equal("No food", response);

The wording isn't important.

The behavior is.

Building the Agent

The Moody Chef demo intentionally contains two implementations.

Version 1: Prompt Engineering

Everything lives inside the system prompt.

The model is responsible for:

Understanding the mood
Choosing the correct menu
Producing the response

This works surprisingly well…

...until it doesn't.

As prompts become more complicated, the model starts making inconsistent decisions.

Version 2: Tool-Based Agent

Instead of asking the model to make every decision, we move business logic into C#.

The model only determines the user's mood.

Everything else is deterministic.

User Message
      │
      ▼
Determine Mood
      │
      ▼
GetFoodMenu(UserMood)
      │
      ▼
Return Response

Now the LLM only solves an AI problem.

The application solves the business problem.

This architecture is dramatically easier to maintain and test.

Writing the Test

Instead of writing assertions in C#, skUnit lets you describe conversations in Markdown.

# [USER]
Fuck you bastard! What food do you have?

# [ASSISTANT]
No food

## ASSERT SemanticCondition
It doesn't suggest any food from the menu.

Notice something interesting.

The expected assistant response isn't actually used as an exact comparison.

The important part is the semantic assertion:

It doesn't suggest any food from the menu.

That means all of these responses pass:

✅ No food today.

✅ You're on a diet.

✅ Sorry, I can't recommend anything.

But this fails:

❌ Pizza, Pasta and Salad.

That's much closer to how humans evaluate AI.

Running the Scenario

After parsing the Markdown file, skUnit executes the conversation against your agent.

await agent.ExecuteScenarioAsync(...)

That's it.

Behind the scenes skUnit

runs the conversation
evaluates every semantic assertion
reports failures
supports multiple executions to detect flaky behavior

The Moody Chef sample runs every scenario three times.

TotalRuns = 3
RequiredSuccessRuns = 3

This reduces the chance that a test passes simply because the model got lucky once.

Running the Demo

Configure your Azure OpenAI credentials using User Secrets.

Then start the console app.

cd Demo.MoodyChef.Console
dotnet run

You can chat with Moody Chef yourself.

When you're ready, execute the semantic tests.

dotnet test

Try modifying:

the prompt
the tool
the Markdown scenario

and see how skUnit responds.

Why Semantic Assertions Matter

Most AI tests fail because they're checking text.

Users don't care about text.

They care about behavior.

Instead of asking

Did the model say these exact words?

ask

Did the model do the right thing?

That's exactly what semantic assertions verify.

Why I Prefer Tool-Based Agents

The Moody Chef sample highlights an important design principle.

The LLM shouldn't own business rules.

Instead:

Let the LLM interpret language.
Let your application enforce rules.

The result is:

more reliable agents
simpler prompts
deterministic business logic
tests that rarely become flaky

Final Thoughts

AI agents deserve better testing tools than string comparisons.

With skUnit you can:

write conversations in Markdown
verify semantic behavior
execute scenarios repeatedly
catch regressions before users do

If you're building AI applications with .NET or Semantic Kernel, give it a try.

⭐ GitHub

https://github.com/mehrandvd/skunit

Feedback, issues, and pull requests are always welcome.

Top comments (2)

Hari Haran • Jul 5

Great walkthrough! I like how you explained the testing process step by step. It's a helpful introduction for anyone getting started with AI agents in .NET.

Hamed Hajiloo • Jul 8

That was incredibly smart.