Matěj Štágl

Posted on Oct 28

Building Autonomous AI Agents in C#: Tips from Real-World Applications

#agents #architecture #ai #csharp

Building Autonomous AI Agents in C#: Tips from Real-World Applications

When I first started building autonomous AI agents, I made the classic mistake: I thought the challenge was just picking the right LLM. Three production deployments later, I learned the hard truth—orchestration, memory management, and failure recovery are what separate demos from systems that actually ship.

After working with several C# AI agent implementations in production, I've seen what works and what doesn't. Let me share some patterns that might save you the debugging nightmares I went through.

The Real Challenge: Orchestration, Not Just Inference

Companies like H&M and JPMorgan Chase have deployed autonomous AI agents to improve efficiency in customer service and contract reviews. But here's what the case studies don't tell you: the agents that succeed in production aren't just fancy chatbots—they're carefully orchestrated workflows.

Recent research on AI agent development shows a clear trend toward modular, multi-agent architectures. The days of monolithic AI systems are over. Modern autonomous agents need to coordinate multiple specialized sub-agents, each handling specific tasks while maintaining a coherent overall workflow.

For C# developers, the main options are LlmTornado, Semantic Kernel, and LangChain. I've used all three in production, and they each have their strengths. LlmTornado stands out for its built-in orchestration capabilities and clean integration with 100+ API providers, while Semantic Kernel offers tight Microsoft ecosystem integration.

Installation and Setup

Before diving into code, let's get the essentials installed:

dotnet add package LlmTornado
dotnet add package LlmTornado.Agents

Pattern 1: The Research Agent - Parallel Execution Done Right

I spent three days debugging a research agent that would randomly hang. The issue? I was running web searches sequentially. When one search timed out, the whole system froze.

Here's a pattern that actually works in production—parallel execution with proper semaphore control:

using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Responses;

public class ResearchOrchestrator
{
    private readonly TornadoApi _client;
    private readonly int _maxParallelism = 4;

    public ResearchOrchestrator(string apiKey)
    {
        _client = new TornadoApi(apiKey, LLmProviders.OpenAi);
    }

    public async Task<string> ExecuteResearch(string userQuery)
    {
        // Step 1: Planning agent generates search queries
        var planner = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt5.V5Mini,
            name: "PlannerAgent",
            instructions: "Generate 3-5 focused web search queries for this topic.",
            outputSchema: typeof(WebSearchPlan)
        );

        var planResult = await planner.Run(userQuery);
        var plan = planResult.Messages.Last().Content.JsonDecode<WebSearchPlan>();

        // Step 2: Execute searches in parallel with controlled concurrency
        var semaphore = new SemaphoreSlim(_maxParallelism);
        var searchTasks = plan.Queries.Select(async query =>
        {
            await semaphore.WaitAsync();
            try
            {
                return await RunSearchAgent(query);
            }
            finally
            {
                semaphore.Release();
            }
        });

        var results = await Task.WhenAll(searchTasks);

        // Step 3: Synthesize findings
        var reporter = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt5.V5,
            name: "ReportAgent",
            instructions: "Synthesize research into a coherent report (250+ words)."
        );

        reporter.ResponseOptions = new ResponseRequest 
        { 
            Tools = new[] { new ResponseWebSearchTool() } 
        };

        var report = await reporter.Run(
            input: $"User query: {userQuery}\n\nResearch findings:\n{string.Join("\n\n", results)}"
        );

        return report.Messages.Last().Content;
    }

    private async Task<string> RunSearchAgent(string query)
    {
        var searcher = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt5.V5Mini,
            name: "SearchAgent",
            instructions: "Search and summarize in 2-3 paragraphs, <300 words."
        );

        searcher.ResponseOptions = new ResponseRequest 
        { 
            Tools = new[] { new ResponseWebSearchTool() } 
        };

        var result = await searcher.Run(query);
        return result.Messages.Last().Content ?? string.Empty;
    }
}

public struct WebSearchPlan
{
    public string[] Queries { get; set; }
}

Key lesson: Control your parallelism. Don't let 50 concurrent API calls crash your system or drain your token budget. The semaphore pattern saved me during a demo where a client asked about a topic that generated 20+ search queries.

Pattern 2: Memory-Augmented Chatbots - Getting Context Right

Here's what I wish someone had told me: conversation memory isn't just about saving messages to a file. You need multiple memory layers—short-term (conversation history), long-term (vector embeddings), and entity memory (facts about people, places, things).

According to best practices for building reliable AI agents, using frameworks like Semantic Kernel and Azure AI Foundry for orchestration is crucial. But implementation details matter more than framework choice.

Here's a production-ready chatbot architecture:

using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Moderation;
using LlmTornado.VectorDatabases;
using LlmTornado.Embedding;
using LlmTornado.Embedding.Models;

public class ProductionChatbot
{
    private readonly TornadoApi _client;
    private readonly string _conversationFile;
    private readonly IVectorDatabase _vectorDb;

    public ProductionChatbot(string apiKey, string conversationFile, string chromaDbUri)
    {
        _client = new TornadoApi(apiKey, LLmProviders.OpenAi);
        _conversationFile = conversationFile;
        _vectorDb = new TornadoChromaDB(chromaDbUri);
    }

    public async Task<string> Chat(string userInput)
    {
        // Step 1: Safety first - moderate input
        var modResult = await _client.Moderation.CreateModeration(userInput);
        if (modResult.Results.FirstOrDefault()?.Flagged == true)
        {
            throw new InvalidOperationException("Input flagged by moderation");
        }

        // Step 2: Load conversation history
        var messages = await LoadConversationHistory(_conversationFile);

        // Step 3: Retrieve relevant context from vector memory
        var contextAgent = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt5.V5Mini,
            name: "ContextAgent",
            instructions: "Generate 2-3 search queries for retrieving relevant conversation history.",
            outputSchema: typeof(SearchQueries)
        );

        var queryResult = await contextAgent.Run(userInput);
        var queries = queryResult.Messages.Last().Content.JsonDecode<SearchQueries>();

        var relevantContext = await RetrieveVectorContext(queries.Queries);

        // Step 4: Generate response with full context
        var chatAgent = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt5.V5Mini,
            name: "ChatAgent",
            instructions: $"You are a helpful assistant. Context from past conversations:\n{relevantContext}",
            streaming: true
        );

        messages.Add(new ChatMessage(ChatMessageRoles.User, userInput));

        var response = await chatAgent.Run(
            appendMessages: messages,
            streaming: true,
            onAgentRunnerEvent: async (evt) =>
            {
                if (evt is AgentRunnerStreamingEvent streamEvt && 
                    streamEvt.ModelStreamingEvent is ModelStreamingOutputTextDeltaEvent delta)
                {
                    Console.Write(delta.DeltaText);
                }
            }
        );

        // Step 5: Save conversation and update vector memory (async)
        _ = Task.Run(async () => 
        {
            await SaveConversationHistory(response.Messages, _conversationFile);
            await UpdateVectorMemory(userInput, response.Messages.Last().Content);
        });

        return response.Messages.Last().Content;
    }

    private async Task<string> RetrieveVectorContext(string[] queries)
    {
        var embeddingProvider = new TornadoEmbeddingProvider(
            _client, 
            EmbeddingModel.OpenAi.Gen3.Small
        );

        var allDocs = new List<VectorDocument>();
        foreach (var query in queries)
        {
            var embedding = await embeddingProvider.Invoke(query);
            var docs = await _vectorDb.QueryByEmbeddingAsync(embedding, topK: 3);
            allDocs.AddRange(docs);
        }

        return string.Join("\n", allDocs.Select(d => d.Content).Distinct());
    }

    private async Task UpdateVectorMemory(string userMsg, string assistantMsg)
    {
        var embeddingProvider = new TornadoEmbeddingProvider(
            _client, 
            EmbeddingModel.OpenAi.Gen3.Small
        );

        var docs = new[]
        {
            new VectorDocument(
                id: Guid.NewGuid().ToString(),
                content: userMsg,
                metadata: new Dictionary<string, object> 
                { 
                    { "Role", "User" },
                    { "Timestamp", DateTime.UtcNow }
                },
                embedding: await embeddingProvider.Invoke(userMsg)
            )
        };

        await _vectorDb.AddDocumentsAsync(docs);
    }

    private async Task<List<ChatMessage>> LoadConversationHistory(string file)
    {
        if (!File.Exists(file)) return new List<ChatMessage>();

        var messages = new List<ChatMessage>();
        await messages.LoadMessagesAsync(file);
        return messages;
    }

    private async Task SaveConversationHistory(List<ChatMessage> messages, string file)
    {
        messages.SaveConversation(file);
    }
}

public struct SearchQueries
{
    public string[] Queries { get; set; }
}

Critical insight: Don't block the main thread waiting for vector updates. Fire-and-forget background tasks keep your chatbot responsive. I learned this when users complained about 3-second delays after every message.

Pattern 3: Tool Calling with Approval Gates

In a production environment for a financial services client, we couldn't let agents execute arbitrary code or make API calls without human approval. Here's the pattern that worked:

using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using System.ComponentModel;

public class ApprovedToolAgent
{
    private readonly TornadoApi _client;

    public ApprovedToolAgent(string apiKey)
    {
        _client = new TornadoApi(apiKey, LLmProviders.OpenAi);
    }

    public async Task<string> RunWithApproval(string userQuery)
    {
        var agent = new TornadoAgent(
            client: _client,
            model: ChatModel.OpenAi.Gpt41.V41Mini,
            name: "ControlledAgent",
            instructions: "Use available tools to answer questions.",
            tools: new List<Delegate> { GetFinancialData },
            toolPermissionRequired: new Dictionary<string, bool>
            {
                { "GetFinancialData", true } // Requires approval
            }
        );

        var result = await agent.Run(
            input: userQuery,
            toolPermissionHandle: async (toolRequest) =>
            {
                Console.WriteLine($"\nAgent wants to call: {toolRequest}");
                Console.Write("Approve? (y/n): ");
                var approval = Console.ReadLine();
                return approval?.ToLower().StartsWith('y') ?? false;
            }
        );

        return result.Messages.Last().Content;
    }

    [Description("Retrieves sensitive financial data")]
    private static string GetFinancialData(
        [Description("Account identifier")] string accountId,
        [Description("Date range in YYYY-MM-DD format")] string dateRange)
    {
        // In production, this would call your actual financial API
        return $"Financial data for {accountId} in range {dateRange}";
    }
}

Lesson learned: Permission gates aren't just for security—they're for compliance. When we added approval workflows, our audit team actually thanked us.

Common Pitfalls and Solutions

Problem 1: Token Budget Explosions

I once deployed an agent that cost $200 in a single afternoon because it was including the entire conversation history (300+ messages) in every request.

Solution: Implement sliding window context:

var recentMessages = allMessages.TakeLast(10).ToList();
var contextSummary = await SummarizeOlderMessages(
    allMessages.Take(allMessages.Count - 10).ToList()
);
recentMessages.Insert(0, new ChatMessage(
    ChatMessageRoles.System, 
    $"Previous conversation summary: {contextSummary}"
));

Problem 2: Streaming Events Not Displaying

Streaming was "working" but users saw nothing until the entire response completed. The issue? I wasn't handling the event types correctly.

Solution: Check the event type hierarchy:

onAgentRunnerEvent: async (evt) =>
{
    if (evt.EventType == AgentRunnerEventTypes.Streaming && 
        evt is AgentRunnerStreamingEvent streamingEvent &&
        streamingEvent.ModelStreamingEvent is ModelStreamingOutputTextDeltaEvent deltaEvent)
    {
        Console.Write(deltaEvent.DeltaText);
        await Console.Out.FlushAsync(); // Critical for real-time display
    }
}

Problem 3: Dead Agents (No Error, Just Silence)

Agents would occasionally stop responding with no error messages. After adding proper cancellation handling, I discovered they were stuck waiting for user input during automated tests.

Solution: Always use cancellation tokens:

var cts = new CancellationTokenSource(TimeSpan.FromMinutes(5));

try
{
    var result = await agent.Run(
        input: userQuery,
        cancellationToken: cts.Token
    );
}
catch (OperationCanceledException)
{
    Console.WriteLine("Agent timed out - check for infinite loops or stuck tools");
}

Decision Matrix: When to Use Which Pattern

Use Case	Pattern	Why
Research/Analysis	Parallel Multi-Agent	Faster results, better coverage
Customer Support	Memory-Augmented Single Agent	Personalization, context retention
Sensitive Operations	Tool Approval Gates	Compliance, security
Long-Running Tasks	Async Background Processing	User experience, responsiveness

Troubleshooting Guide

Error: "Tool X not found in tools list"

Check that you're calling AddTornadoTool() or AddAgentTool(), not just adding to Options.Tools directly
Verify delegate method signatures match expected patterns

Error: Conversation history not persisting

Ensure you're calling SaveConversation() after each interaction
Check file paths are absolute or properly relative to working directory

Error: Vector search returns irrelevant results

Verify embedding model matches between indexing and querying
Try increasing the chunk size (I found 250-500 tokens works best)
Add metadata filters to narrow search scope

Structured Output: The Game Changer

Microsoft Semantic Kernel leads in building autonomous coding agents, but any framework can benefit from structured outputs. Instead of parsing free-form text, define schemas:

[Description("Analysis of customer sentiment")]
public struct SentimentAnalysis
{
    [Description("Overall sentiment: Positive, Negative, or Neutral")]
    public string Sentiment { get; set; }

    [Description("Confidence score 0-1")]
    public float Confidence { get; set; }

    [Description("Key phrases that influenced the sentiment")]
    public string[] KeyPhrases { get; set; }
}

var agent = new TornadoAgent(
    client: api,
    model: ChatModel.OpenAi.Gpt41.V41Mini,
    instructions: "Analyze customer feedback for sentiment.",
    outputSchema: typeof(SentimentAnalysis)
);

var result = await agent.Run(customerFeedback);
var analysis = result.Messages.Last().Content.JsonDecode<SentimentAnalysis>();

Console.WriteLine($"Sentiment: {analysis.Sentiment} ({analysis.Confidence:P0})");

This eliminated 90% of our parsing errors. The LLM returns valid JSON that deserializes cleanly, no regex hacks required.

What's Next for Me

I'm currently exploring agent-to-agent communication patterns where specialized agents coordinate without a central orchestrator. Early results suggest it scales better than hub-and-spoke architectures, but debugging is... interesting.

For more examples and production patterns, check the LlmTornado repository where you'll find complete sample implementations including coding agents, multi-agent orchestration, and MCP tool integrations.

The future of autonomous agents isn't about building one super-intelligent AI—it's about orchestrating specialized agents that work together reliably. Focus on modularity, failure recovery, and observability. Those are the skills that'll matter when you're debugging at 2 AM because an agent is doing something unexpected in production.

And trust me, it will do something unexpected in production.

DEV Community

Building Autonomous AI Agents in C#: Tips from Real-World Applications

Building Autonomous AI Agents in C#: Tips from Real-World Applications

The Real Challenge: Orchestration, Not Just Inference

Installation and Setup

Pattern 1: The Research Agent - Parallel Execution Done Right

Pattern 2: Memory-Augmented Chatbots - Getting Context Right

Pattern 3: Tool Calling with Approval Gates

Common Pitfalls and Solutions

Problem 1: Token Budget Explosions

Problem 2: Streaming Events Not Displaying

Problem 3: Dead Agents (No Error, Just Silence)

Decision Matrix: When to Use Which Pattern

Troubleshooting Guide

Structured Output: The Game Changer

What's Next for Me

Top comments (0)