Building Autonomous AI Agents in C#: Tips from Real-World Applications
When I first started building autonomous AI agents, I made the classic mistake: I thought the challenge was just picking the right LLM. Three production deployments later, I learned the hard truth—orchestration, memory management, and failure recovery are what separate demos from systems that actually ship.
After working with several C# AI agent implementations in production, I've seen what works and what doesn't. Let me share some patterns that might save you the debugging nightmares I went through.
The Real Challenge: Orchestration, Not Just Inference
Companies like H&M and JPMorgan Chase have deployed autonomous AI agents to improve efficiency in customer service and contract reviews. But here's what the case studies don't tell you: the agents that succeed in production aren't just fancy chatbots—they're carefully orchestrated workflows.
Recent research on AI agent development shows a clear trend toward modular, multi-agent architectures. The days of monolithic AI systems are over. Modern autonomous agents need to coordinate multiple specialized sub-agents, each handling specific tasks while maintaining a coherent overall workflow.
For C# developers, the main options are LlmTornado, Semantic Kernel, and LangChain. I've used all three in production, and they each have their strengths. LlmTornado stands out for its built-in orchestration capabilities and clean integration with 100+ API providers, while Semantic Kernel offers tight Microsoft ecosystem integration.
Installation and Setup
Before diving into code, let's get the essentials installed:
dotnet add package LlmTornado
dotnet add package LlmTornado.Agents
Pattern 1: The Research Agent - Parallel Execution Done Right
I spent three days debugging a research agent that would randomly hang. The issue? I was running web searches sequentially. When one search timed out, the whole system froze.
Here's a pattern that actually works in production—parallel execution with proper semaphore control:
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Responses;
public class ResearchOrchestrator
{
private readonly TornadoApi _client;
private readonly int _maxParallelism = 4;
public ResearchOrchestrator(string apiKey)
{
_client = new TornadoApi(apiKey, LLmProviders.OpenAi);
}
public async Task<string> ExecuteResearch(string userQuery)
{
// Step 1: Planning agent generates search queries
var planner = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt5.V5Mini,
name: "PlannerAgent",
instructions: "Generate 3-5 focused web search queries for this topic.",
outputSchema: typeof(WebSearchPlan)
);
var planResult = await planner.Run(userQuery);
var plan = planResult.Messages.Last().Content.JsonDecode<WebSearchPlan>();
// Step 2: Execute searches in parallel with controlled concurrency
var semaphore = new SemaphoreSlim(_maxParallelism);
var searchTasks = plan.Queries.Select(async query =>
{
await semaphore.WaitAsync();
try
{
return await RunSearchAgent(query);
}
finally
{
semaphore.Release();
}
});
var results = await Task.WhenAll(searchTasks);
// Step 3: Synthesize findings
var reporter = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt5.V5,
name: "ReportAgent",
instructions: "Synthesize research into a coherent report (250+ words)."
);
reporter.ResponseOptions = new ResponseRequest
{
Tools = new[] { new ResponseWebSearchTool() }
};
var report = await reporter.Run(
input: $"User query: {userQuery}\n\nResearch findings:\n{string.Join("\n\n", results)}"
);
return report.Messages.Last().Content;
}
private async Task<string> RunSearchAgent(string query)
{
var searcher = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt5.V5Mini,
name: "SearchAgent",
instructions: "Search and summarize in 2-3 paragraphs, <300 words."
);
searcher.ResponseOptions = new ResponseRequest
{
Tools = new[] { new ResponseWebSearchTool() }
};
var result = await searcher.Run(query);
return result.Messages.Last().Content ?? string.Empty;
}
}
public struct WebSearchPlan
{
public string[] Queries { get; set; }
}
Key lesson: Control your parallelism. Don't let 50 concurrent API calls crash your system or drain your token budget. The semaphore pattern saved me during a demo where a client asked about a topic that generated 20+ search queries.
Pattern 2: Memory-Augmented Chatbots - Getting Context Right
Here's what I wish someone had told me: conversation memory isn't just about saving messages to a file. You need multiple memory layers—short-term (conversation history), long-term (vector embeddings), and entity memory (facts about people, places, things).
According to best practices for building reliable AI agents, using frameworks like Semantic Kernel and Azure AI Foundry for orchestration is crucial. But implementation details matter more than framework choice.
Here's a production-ready chatbot architecture:
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using LlmTornado.Moderation;
using LlmTornado.VectorDatabases;
using LlmTornado.Embedding;
using LlmTornado.Embedding.Models;
public class ProductionChatbot
{
private readonly TornadoApi _client;
private readonly string _conversationFile;
private readonly IVectorDatabase _vectorDb;
public ProductionChatbot(string apiKey, string conversationFile, string chromaDbUri)
{
_client = new TornadoApi(apiKey, LLmProviders.OpenAi);
_conversationFile = conversationFile;
_vectorDb = new TornadoChromaDB(chromaDbUri);
}
public async Task<string> Chat(string userInput)
{
// Step 1: Safety first - moderate input
var modResult = await _client.Moderation.CreateModeration(userInput);
if (modResult.Results.FirstOrDefault()?.Flagged == true)
{
throw new InvalidOperationException("Input flagged by moderation");
}
// Step 2: Load conversation history
var messages = await LoadConversationHistory(_conversationFile);
// Step 3: Retrieve relevant context from vector memory
var contextAgent = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt5.V5Mini,
name: "ContextAgent",
instructions: "Generate 2-3 search queries for retrieving relevant conversation history.",
outputSchema: typeof(SearchQueries)
);
var queryResult = await contextAgent.Run(userInput);
var queries = queryResult.Messages.Last().Content.JsonDecode<SearchQueries>();
var relevantContext = await RetrieveVectorContext(queries.Queries);
// Step 4: Generate response with full context
var chatAgent = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt5.V5Mini,
name: "ChatAgent",
instructions: $"You are a helpful assistant. Context from past conversations:\n{relevantContext}",
streaming: true
);
messages.Add(new ChatMessage(ChatMessageRoles.User, userInput));
var response = await chatAgent.Run(
appendMessages: messages,
streaming: true,
onAgentRunnerEvent: async (evt) =>
{
if (evt is AgentRunnerStreamingEvent streamEvt &&
streamEvt.ModelStreamingEvent is ModelStreamingOutputTextDeltaEvent delta)
{
Console.Write(delta.DeltaText);
}
}
);
// Step 5: Save conversation and update vector memory (async)
_ = Task.Run(async () =>
{
await SaveConversationHistory(response.Messages, _conversationFile);
await UpdateVectorMemory(userInput, response.Messages.Last().Content);
});
return response.Messages.Last().Content;
}
private async Task<string> RetrieveVectorContext(string[] queries)
{
var embeddingProvider = new TornadoEmbeddingProvider(
_client,
EmbeddingModel.OpenAi.Gen3.Small
);
var allDocs = new List<VectorDocument>();
foreach (var query in queries)
{
var embedding = await embeddingProvider.Invoke(query);
var docs = await _vectorDb.QueryByEmbeddingAsync(embedding, topK: 3);
allDocs.AddRange(docs);
}
return string.Join("\n", allDocs.Select(d => d.Content).Distinct());
}
private async Task UpdateVectorMemory(string userMsg, string assistantMsg)
{
var embeddingProvider = new TornadoEmbeddingProvider(
_client,
EmbeddingModel.OpenAi.Gen3.Small
);
var docs = new[]
{
new VectorDocument(
id: Guid.NewGuid().ToString(),
content: userMsg,
metadata: new Dictionary<string, object>
{
{ "Role", "User" },
{ "Timestamp", DateTime.UtcNow }
},
embedding: await embeddingProvider.Invoke(userMsg)
)
};
await _vectorDb.AddDocumentsAsync(docs);
}
private async Task<List<ChatMessage>> LoadConversationHistory(string file)
{
if (!File.Exists(file)) return new List<ChatMessage>();
var messages = new List<ChatMessage>();
await messages.LoadMessagesAsync(file);
return messages;
}
private async Task SaveConversationHistory(List<ChatMessage> messages, string file)
{
messages.SaveConversation(file);
}
}
public struct SearchQueries
{
public string[] Queries { get; set; }
}
Critical insight: Don't block the main thread waiting for vector updates. Fire-and-forget background tasks keep your chatbot responsive. I learned this when users complained about 3-second delays after every message.
Pattern 3: Tool Calling with Approval Gates
In a production environment for a financial services client, we couldn't let agents execute arbitrary code or make API calls without human approval. Here's the pattern that worked:
using LlmTornado.Agents;
using LlmTornado.Chat;
using LlmTornado.Chat.Models;
using System.ComponentModel;
public class ApprovedToolAgent
{
private readonly TornadoApi _client;
public ApprovedToolAgent(string apiKey)
{
_client = new TornadoApi(apiKey, LLmProviders.OpenAi);
}
public async Task<string> RunWithApproval(string userQuery)
{
var agent = new TornadoAgent(
client: _client,
model: ChatModel.OpenAi.Gpt41.V41Mini,
name: "ControlledAgent",
instructions: "Use available tools to answer questions.",
tools: new List<Delegate> { GetFinancialData },
toolPermissionRequired: new Dictionary<string, bool>
{
{ "GetFinancialData", true } // Requires approval
}
);
var result = await agent.Run(
input: userQuery,
toolPermissionHandle: async (toolRequest) =>
{
Console.WriteLine($"\nAgent wants to call: {toolRequest}");
Console.Write("Approve? (y/n): ");
var approval = Console.ReadLine();
return approval?.ToLower().StartsWith('y') ?? false;
}
);
return result.Messages.Last().Content;
}
[Description("Retrieves sensitive financial data")]
private static string GetFinancialData(
[Description("Account identifier")] string accountId,
[Description("Date range in YYYY-MM-DD format")] string dateRange)
{
// In production, this would call your actual financial API
return $"Financial data for {accountId} in range {dateRange}";
}
}
Lesson learned: Permission gates aren't just for security—they're for compliance. When we added approval workflows, our audit team actually thanked us.
Common Pitfalls and Solutions
Problem 1: Token Budget Explosions
I once deployed an agent that cost $200 in a single afternoon because it was including the entire conversation history (300+ messages) in every request.
Solution: Implement sliding window context:
var recentMessages = allMessages.TakeLast(10).ToList();
var contextSummary = await SummarizeOlderMessages(
allMessages.Take(allMessages.Count - 10).ToList()
);
recentMessages.Insert(0, new ChatMessage(
ChatMessageRoles.System,
$"Previous conversation summary: {contextSummary}"
));
Problem 2: Streaming Events Not Displaying
Streaming was "working" but users saw nothing until the entire response completed. The issue? I wasn't handling the event types correctly.
Solution: Check the event type hierarchy:
onAgentRunnerEvent: async (evt) =>
{
if (evt.EventType == AgentRunnerEventTypes.Streaming &&
evt is AgentRunnerStreamingEvent streamingEvent &&
streamingEvent.ModelStreamingEvent is ModelStreamingOutputTextDeltaEvent deltaEvent)
{
Console.Write(deltaEvent.DeltaText);
await Console.Out.FlushAsync(); // Critical for real-time display
}
}
Problem 3: Dead Agents (No Error, Just Silence)
Agents would occasionally stop responding with no error messages. After adding proper cancellation handling, I discovered they were stuck waiting for user input during automated tests.
Solution: Always use cancellation tokens:
var cts = new CancellationTokenSource(TimeSpan.FromMinutes(5));
try
{
var result = await agent.Run(
input: userQuery,
cancellationToken: cts.Token
);
}
catch (OperationCanceledException)
{
Console.WriteLine("Agent timed out - check for infinite loops or stuck tools");
}
Decision Matrix: When to Use Which Pattern
| Use Case | Pattern | Why |
|---|---|---|
| Research/Analysis | Parallel Multi-Agent | Faster results, better coverage |
| Customer Support | Memory-Augmented Single Agent | Personalization, context retention |
| Sensitive Operations | Tool Approval Gates | Compliance, security |
| Long-Running Tasks | Async Background Processing | User experience, responsiveness |
Troubleshooting Guide
Error: "Tool X not found in tools list"
- Check that you're calling
AddTornadoTool()orAddAgentTool(), not just adding toOptions.Toolsdirectly - Verify delegate method signatures match expected patterns
Error: Conversation history not persisting
- Ensure you're calling
SaveConversation()after each interaction - Check file paths are absolute or properly relative to working directory
Error: Vector search returns irrelevant results
- Verify embedding model matches between indexing and querying
- Try increasing the chunk size (I found 250-500 tokens works best)
- Add metadata filters to narrow search scope
Structured Output: The Game Changer
Microsoft Semantic Kernel leads in building autonomous coding agents, but any framework can benefit from structured outputs. Instead of parsing free-form text, define schemas:
[Description("Analysis of customer sentiment")]
public struct SentimentAnalysis
{
[Description("Overall sentiment: Positive, Negative, or Neutral")]
public string Sentiment { get; set; }
[Description("Confidence score 0-1")]
public float Confidence { get; set; }
[Description("Key phrases that influenced the sentiment")]
public string[] KeyPhrases { get; set; }
}
var agent = new TornadoAgent(
client: api,
model: ChatModel.OpenAi.Gpt41.V41Mini,
instructions: "Analyze customer feedback for sentiment.",
outputSchema: typeof(SentimentAnalysis)
);
var result = await agent.Run(customerFeedback);
var analysis = result.Messages.Last().Content.JsonDecode<SentimentAnalysis>();
Console.WriteLine($"Sentiment: {analysis.Sentiment} ({analysis.Confidence:P0})");
This eliminated 90% of our parsing errors. The LLM returns valid JSON that deserializes cleanly, no regex hacks required.
What's Next for Me
I'm currently exploring agent-to-agent communication patterns where specialized agents coordinate without a central orchestrator. Early results suggest it scales better than hub-and-spoke architectures, but debugging is... interesting.
For more examples and production patterns, check the LlmTornado repository where you'll find complete sample implementations including coding agents, multi-agent orchestration, and MCP tool integrations.
The future of autonomous agents isn't about building one super-intelligent AI—it's about orchestrating specialized agents that work together reliably. Focus on modularity, failure recovery, and observability. Those are the skills that'll matter when you're debugging at 2 AM because an agent is doing something unexpected in production.
And trust me, it will do something unexpected in production.
Top comments (0)