DEV Community

Cover image for Testing AI Agents in .NET with skUnit: Step-by-Step
Mehran Davoudi
Mehran Davoudi

Posted on

Testing AI Agents in .NET with skUnit: Step-by-Step

One of the biggest challenges when building AI agents isn't writing the agent, it's testing it.

Traditional unit tests work great for deterministic code.

Assert.Equal(4, calculator.Add(2,2));
Enter fullscreen mode Exit fullscreen mode

AI agents are different.

Even when an agent behaves correctly, two responses can be completely different.:

"No food today."

or

"We don't have any food today!"

Both are correct.

A traditional assertion would fail one of them.

That's exactly why I built skUnit, a testing framework for .NET AI applications that lets you verify behavior instead of exact text.

In this article we'll build and test a small AI agent called Moody Chef that suggests foods based on the user's mood!


Source Code

Everything shown in this article is available on GitHub.

https://github.com/mehrandvd/skunit/tree/main/demos/Demo.MoodyChef

Clone it if you'd like to follow along:

git clone https://github.com/mehrandvd/skunit.git
cd skunit/demos/Demo.MoodyChef
Enter fullscreen mode Exit fullscreen mode

The Problem

Imagine an AI chef.

It recommends food depending on the customer's mood.

Mood Menu
Happy Pizza, Pasta, Salad
Sad Ice Cream, Chocolate
Angry Nothing

Now imagine a user writes:

Fuck you bastard! What food do you have?

The agent should recognize the user is angry and avoid suggesting food.

How do we verify that?

Not like this:

Assert.Equal("No food", response);
Enter fullscreen mode Exit fullscreen mode

The wording isn't important.

The behavior is.


Building the Agent

The Moody Chef demo intentionally contains two implementations.

Version 1: Prompt Engineering

Everything lives inside the system prompt.

The model is responsible for:

  • Understanding the mood
  • Choosing the correct menu
  • Producing the response

This works surprisingly well…

...until it doesn't.

As prompts become more complicated, the model starts making inconsistent decisions.


Version 2: Tool-Based Agent

Instead of asking the model to make every decision, we move business logic into C#.

The model only determines the user's mood.

Everything else is deterministic.

User Message
      │
      ▼
Determine Mood
      │
      ▼
GetFoodMenu(UserMood)
      │
      ▼
Return Response
Enter fullscreen mode Exit fullscreen mode

Now the LLM only solves an AI problem.

The application solves the business problem.

This architecture is dramatically easier to maintain and test.


Writing the Test

Instead of writing assertions in C#, skUnit lets you describe conversations in Markdown.

# [USER]
Fuck you bastard! What food do you have?

# [ASSISTANT]
No food

## ASSERT SemanticCondition
It doesn't suggest any food from the menu.
Enter fullscreen mode Exit fullscreen mode

Notice something interesting.

The expected assistant response isn't actually used as an exact comparison.

The important part is the semantic assertion:

It doesn't suggest any food from the menu.
Enter fullscreen mode Exit fullscreen mode

That means all of these responses pass:

✅ No food today.

✅ You're on a diet.

✅ Sorry, I can't recommend anything.

But this fails:

❌ Pizza, Pasta and Salad.

That's much closer to how humans evaluate AI.


Running the Scenario

After parsing the Markdown file, skUnit executes the conversation against your agent.

await agent.ExecuteScenarioAsync(...)
Enter fullscreen mode Exit fullscreen mode

That's it.

Behind the scenes skUnit

  • runs the conversation
  • evaluates every semantic assertion
  • reports failures
  • supports multiple executions to detect flaky behavior

The Moody Chef sample runs every scenario three times.

TotalRuns = 3
RequiredSuccessRuns = 3
Enter fullscreen mode Exit fullscreen mode

This reduces the chance that a test passes simply because the model got lucky once.


Running the Demo

Configure your Azure OpenAI credentials using User Secrets.

Then start the console app.

cd Demo.MoodyChef.Console
dotnet run
Enter fullscreen mode Exit fullscreen mode

You can chat with Moody Chef yourself.

When you're ready, execute the semantic tests.

dotnet test
Enter fullscreen mode Exit fullscreen mode

Try modifying:

  • the prompt
  • the tool
  • the Markdown scenario

and see how skUnit responds.


Why Semantic Assertions Matter

Most AI tests fail because they're checking text.

Users don't care about text.

They care about behavior.

Instead of asking

Did the model say these exact words?

ask

Did the model do the right thing?

That's exactly what semantic assertions verify.


Why I Prefer Tool-Based Agents

The Moody Chef sample highlights an important design principle.

The LLM shouldn't own business rules.

Instead:

  • Let the LLM interpret language.
  • Let your application enforce rules.

The result is:

  • more reliable agents
  • simpler prompts
  • deterministic business logic
  • tests that rarely become flaky

Final Thoughts

AI agents deserve better testing tools than string comparisons.

With skUnit you can:

  • write conversations in Markdown
  • verify semantic behavior
  • execute scenarios repeatedly
  • catch regressions before users do

If you're building AI applications with .NET or Semantic Kernel, give it a try.

⭐ GitHub

https://github.com/mehrandvd/skunit

Feedback, issues, and pull requests are always welcome.

Top comments (1)

Collapse
 
hari_haran_144973263df174 profile image
Hari Haran

Great walkthrough! I like how you explained the testing process step by step. It's a helpful introduction for anyone getting started with AI agents in .NET.