You've built agents. You've orchestrated workflows. You've integrated MCP tools. Now comes the part that separates demos from production systems: deployment.
In Part 3, we explored MCP and universal tool interoperability. This final part covers everything you need to run Agent Framework in production—Azure AI Foundry deployment, observability, session persistence, scaling, security, and cost management.
The Production Gap
Here's what usually happens: a developer builds an impressive agent demo, shows it to stakeholders, gets approval, and then... reality hits.
- "How do we debug when something goes wrong?"
- "Why did our Azure bill triple?"
- "The agent forgot the conversation after the server restarted."
- "It works, but it's slow under load."
Production AI systems require the same rigor as any production software—plus new considerations around token costs, model latency, and non-deterministic behavior.
Let's close that gap.
Agent Framework in ASP.NET Core
First, let's structure our agent application properly for production hosting.
Project Setup
dotnet new webapi -n AgentService
cd AgentService
dotnet add package Microsoft.Agents.AI --prerelease
dotnet add package Microsoft.Extensions.AI.AzureAIInference
dotnet add package Azure.Identity
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package Azure.Monitor.OpenTelemetry.Exporter
Service Registration
// Program.cs
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Azure.AI.Inference;
using Azure.Identity;
var builder = WebApplication.CreateBuilder(args);
// Register the chat client (Azure OpenAI)
builder.Services.AddSingleton<IChatClient>(sp =>
{
var config = sp.GetRequiredService<IConfiguration>();
return new ChatCompletionsClient(
new Uri(config["AzureOpenAI:Endpoint"]!),
new DefaultAzureCredential()
).AsChatClient(config["AzureOpenAI:DeploymentName"]!);
});
// Register agents
builder.Services.AddSingleton<CustomerSupportAgent>();
builder.Services.AddSingleton<OrderLookupAgent>();
builder.Services.AddSingleton<EscalationAgent>();
// Register the agent runtime
builder.Services.AddAgentRuntime(options =>
{
options.DefaultAgent = "CustomerSupport";
options.SessionTimeout = TimeSpan.FromMinutes(30);
});
var app = builder.Build();
Defining Production Agents
public class CustomerSupportAgent : ChatClientAgent
{
private readonly IOrderService _orderService;
private readonly ILogger<CustomerSupportAgent> _logger;
public CustomerSupportAgent(
IChatClient chatClient,
IOrderService orderService,
ILogger<CustomerSupportAgent> logger)
: base(chatClient, new ChatClientAgentOptions
{
Name = "CustomerSupport",
Instructions = """
You are a customer support agent for an e-commerce company.
Guidelines:
- Always greet customers warmly
- Use the order lookup tool for order-related queries
- Escalate to human support if the customer is frustrated
- Never share internal system details
- Keep responses concise but helpful
"""
})
{
_orderService = orderService;
_logger = logger;
AddTools(this);
}
[AgentTool, Description("Look up an order by order ID or customer email")]
public async Task<OrderInfo> LookupOrder(
[Description("Order ID (e.g., ORD-12345)")] string? orderId = null,
[Description("Customer email address")] string? email = null)
{
_logger.LogInformation("Order lookup requested: {OrderId}, {Email}", orderId, email);
if (orderId != null)
return await _orderService.GetByIdAsync(orderId);
if (email != null)
return await _orderService.GetLatestByEmailAsync(email);
throw new ArgumentException("Either orderId or email must be provided");
}
[AgentTool, Description("Escalate conversation to human support")]
public EscalationResult EscalateToHuman(
[Description("Reason for escalation")] string reason,
[Description("Priority: low, medium, high")] string priority = "medium")
{
_logger.LogWarning("Escalation requested: {Reason} (Priority: {Priority})", reason, priority);
return new EscalationResult
{
TicketId = $"ESC-{DateTime.UtcNow:yyyyMMddHHmmss}",
EstimatedWaitTime = priority == "high" ? "5 minutes" : "15 minutes"
};
}
}
Chat Endpoint
app.MapPost("/api/chat", async (
ChatRequest request,
IAgentRuntime runtime,
CancellationToken ct) =>
{
// Get or create session for conversation continuity
var session = await runtime.GetOrCreateSessionAsync(
request.SessionId ?? Guid.NewGuid().ToString(),
ct);
// Invoke the agent
var response = await runtime.InvokeAgentAsync(
request.AgentName ?? "CustomerSupport",
request.Message,
session,
ct);
return Results.Ok(new ChatResponse
{
SessionId = session.Id,
Message = response.Content,
AgentName = response.AgentName,
TokensUsed = response.Usage?.TotalTokens
});
});
// Streaming endpoint for real-time responses
app.MapPost("/api/chat/stream", async (
ChatRequest request,
IAgentRuntime runtime,
HttpContext httpContext,
CancellationToken ct) =>
{
var session = await runtime.GetOrCreateSessionAsync(
request.SessionId ?? Guid.NewGuid().ToString(),
ct);
httpContext.Response.ContentType = "text/event-stream";
await foreach (var chunk in runtime.InvokeAgentStreamingAsync(
request.AgentName ?? "CustomerSupport",
request.Message,
session,
ct))
{
await httpContext.Response.WriteAsync($"data: {chunk.Content}\n\n", ct);
await httpContext.Response.Body.FlushAsync(ct);
}
});
public record ChatRequest(string Message, string? SessionId = null, string? AgentName = null);
public record ChatResponse(string SessionId, string Message, string AgentName, int? TokensUsed);
Observability with OpenTelemetry
Production systems need visibility. When an agent gives a wrong answer or takes too long, you need to understand why.
Configuring OpenTelemetry
// Program.cs
builder.Services.AddOpenTelemetry()
.ConfigureResource(resource => resource
.AddService("AgentService", serviceVersion: "1.0.0"))
.WithTracing(tracing => tracing
.AddSource("Microsoft.Agents.AI")
.AddSource("AgentService.Telemetry")
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddAzureMonitorTraceExporter(options =>
{
options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
}))
.WithMetrics(metrics => metrics
.AddMeter("Microsoft.Agents.AI")
.AddMeter("AgentService.Metrics")
.AddAspNetCoreInstrumentation()
.AddAzureMonitorMetricExporter(options =>
{
options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
}));
Custom Agent Telemetry Middleware
public class AgentTelemetryMiddleware : IAgentMiddleware
{
private static readonly ActivitySource ActivitySource = new("AgentService.Telemetry");
private static readonly Meter Meter = new("AgentService.Metrics");
private static readonly Counter<long> InvocationCounter =
Meter.CreateCounter<long>("agent.invocations", description: "Total agent invocations");
private static readonly Histogram<double> LatencyHistogram =
Meter.CreateHistogram<double>("agent.latency", "ms", "Agent response latency");
private static readonly Counter<long> TokenCounter =
Meter.CreateCounter<long>("agent.tokens", description: "Total tokens used");
private static readonly Counter<long> ErrorCounter =
Meter.CreateCounter<long>("agent.errors", description: "Agent errors");
private readonly ILogger<AgentTelemetryMiddleware> _logger;
public AgentTelemetryMiddleware(ILogger<AgentTelemetryMiddleware> logger)
{
_logger = logger;
}
public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
{
using var activity = ActivitySource.StartActivity("agent.invoke", ActivityKind.Internal);
var agentName = request.Agent.Name;
var sessionId = request.Session?.Id ?? "none";
activity?.SetTag("agent.name", agentName);
activity?.SetTag("session.id", sessionId);
activity?.SetTag("message.length", request.Message.Length);
InvocationCounter.Add(1, new KeyValuePair<string, object?>("agent", agentName));
var sw = Stopwatch.StartNew();
try
{
var response = await next(request);
sw.Stop();
// Record success metrics
activity?.SetTag("response.status", "success");
activity?.SetTag("tokens.prompt", response.Usage?.InputTokens);
activity?.SetTag("tokens.completion", response.Usage?.OutputTokens);
activity?.SetTag("tokens.total", response.Usage?.TotalTokens);
activity?.SetTag("duration.ms", sw.ElapsedMilliseconds);
LatencyHistogram.Record(sw.ElapsedMilliseconds,
new KeyValuePair<string, object?>("agent", agentName),
new KeyValuePair<string, object?>("status", "success"));
if (response.Usage?.TotalTokens > 0)
{
TokenCounter.Add(response.Usage.TotalTokens,
new KeyValuePair<string, object?>("agent", agentName));
}
_logger.LogInformation(
"Agent {Agent} responded in {Duration}ms using {Tokens} tokens",
agentName, sw.ElapsedMilliseconds, response.Usage?.TotalTokens);
return response;
}
catch (Exception ex)
{
sw.Stop();
activity?.SetTag("response.status", "error");
activity?.SetTag("error.type", ex.GetType().Name);
activity?.SetTag("error.message", ex.Message);
ErrorCounter.Add(1,
new KeyValuePair<string, object?>("agent", agentName),
new KeyValuePair<string, object?>("error.type", ex.GetType().Name));
LatencyHistogram.Record(sw.ElapsedMilliseconds,
new KeyValuePair<string, object?>("agent", agentName),
new KeyValuePair<string, object?>("status", "error"));
_logger.LogError(ex,
"Agent {Agent} failed after {Duration}ms: {Error}",
agentName, sw.ElapsedMilliseconds, ex.Message);
throw;
}
}
}
// Register the middleware
builder.Services.AddAgentRuntime(options =>
{
options.AddMiddleware<AgentTelemetryMiddleware>();
});
What to Monitor
Create an Azure Monitor dashboard tracking:
| Metric | Alert Threshold | Why |
|---|---|---|
agent.latency p95 |
> 10 seconds | User experience degradation |
agent.errors rate |
> 5% | Quality issues |
agent.tokens per session |
> 50,000 | Cost control |
agent.invocations rate |
Context-dependent | Capacity planning |
Structured Logging for Debugging
// Log the full conversation for debugging
_logger.LogDebug("Agent conversation: {@Conversation}", new
{
SessionId = session.Id,
AgentName = agent.Name,
MessageHistory = session.Messages.Select(m => new
{
m.Role,
ContentPreview = m.Content?.Substring(0, Math.Min(100, m.Content.Length ?? 0)),
m.Timestamp
}),
ToolCalls = response.ToolCalls?.Select(t => new { t.Name, t.Arguments })
});
Session Persistence
In-memory sessions vanish when the server restarts. For production, persist them.
Azure Table Storage Session Store
public class AzureTableSessionStore : ISessionStore
{
private readonly TableClient _tableClient;
private readonly ILogger<AzureTableSessionStore> _logger;
public AzureTableSessionStore(
TableServiceClient tableServiceClient,
ILogger<AzureTableSessionStore> logger)
{
_tableClient = tableServiceClient.GetTableClient("agentsessions");
_tableClient.CreateIfNotExists();
_logger = logger;
}
public async Task<AgentSession?> GetAsync(string sessionId, CancellationToken ct = default)
{
try
{
var response = await _tableClient.GetEntityAsync<SessionEntity>(
partitionKey: "sessions",
rowKey: sessionId,
cancellationToken: ct);
return Deserialize(response.Value);
}
catch (RequestFailedException ex) when (ex.Status == 404)
{
return null;
}
}
public async Task SaveAsync(AgentSession session, CancellationToken ct = default)
{
var entity = new SessionEntity
{
PartitionKey = "sessions",
RowKey = session.Id,
Data = JsonSerializer.Serialize(session),
LastModified = DateTimeOffset.UtcNow,
MessageCount = session.Messages.Count,
TotalTokens = session.TotalTokensUsed
};
await _tableClient.UpsertEntityAsync(entity, cancellationToken: ct);
_logger.LogDebug("Saved session {SessionId} with {MessageCount} messages",
session.Id, session.Messages.Count);
}
public async Task DeleteAsync(string sessionId, CancellationToken ct = default)
{
await _tableClient.DeleteEntityAsync("sessions", sessionId, cancellationToken: ct);
}
private AgentSession Deserialize(SessionEntity entity)
{
return JsonSerializer.Deserialize<AgentSession>(entity.Data)!;
}
}
public class SessionEntity : ITableEntity
{
public string PartitionKey { get; set; } = default!;
public string RowKey { get; set; } = default!;
public DateTimeOffset? Timestamp { get; set; }
public ETag ETag { get; set; }
public string Data { get; set; } = default!;
public DateTimeOffset LastModified { get; set; }
public int MessageCount { get; set; }
public int TotalTokens { get; set; }
}
// Registration
builder.Services.AddSingleton<TableServiceClient>(sp =>
{
var config = sp.GetRequiredService<IConfiguration>();
return new TableServiceClient(config.GetConnectionString("Storage"));
});
builder.Services.AddSingleton<ISessionStore, AzureTableSessionStore>();
Session Cleanup Job
Sessions accumulate. Clean up old ones:
public class SessionCleanupJob : BackgroundService
{
private readonly ISessionStore _sessionStore;
private readonly ILogger<SessionCleanupJob> _logger;
private readonly TimeSpan _maxAge = TimeSpan.FromDays(7);
private readonly TimeSpan _interval = TimeSpan.FromHours(1);
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
var cutoff = DateTimeOffset.UtcNow - _maxAge;
var deleted = await _sessionStore.DeleteOlderThanAsync(cutoff, stoppingToken);
_logger.LogInformation("Cleaned up {Count} expired sessions", deleted);
}
catch (Exception ex)
{
_logger.LogError(ex, "Session cleanup failed");
}
await Task.Delay(_interval, stoppingToken);
}
}
}
Scaling Strategies
Horizontal Scaling
Agent Framework applications scale horizontally like any ASP.NET Core app, with one consideration: session affinity.
If using in-memory sessions, enable sticky sessions:
// Azure App Service configuration
{
"siteConfig": {
"clientAffinityEnabled": true
}
}
Better approach—use distributed session storage (shown above) and scale freely:
// No affinity needed with external session store
{
"siteConfig": {
"clientAffinityEnabled": false
}
}
Load Patterns
AI agents have bursty, unpredictable workloads. Configure autoscaling accordingly:
resource appServicePlan 'Microsoft.Web/serverfarms@2023-01-01' = {
name: planName
location: location
sku: {
name: 'P1v3'
tier: 'PremiumV3'
capacity: 2 // Minimum instances
}
}
resource autoScaleSettings 'Microsoft.Insights/autoscalesettings@2022-10-01' = {
name: '${appName}-autoscale'
location: location
properties: {
targetResourceUri: appServicePlan.id
enabled: true
profiles: [
{
name: 'Auto scale based on CPU and requests'
capacity: {
minimum: '2'
maximum: '10'
default: '2'
}
rules: [
{
metricTrigger: {
metricName: 'CpuPercentage'
metricResourceUri: appServicePlan.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT5M'
timeAggregation: 'Average'
operator: 'GreaterThan'
threshold: 70
}
scaleAction: {
direction: 'Increase'
type: 'ChangeCount'
value: '2'
cooldown: 'PT5M'
}
}
{
metricTrigger: {
metricName: 'CpuPercentage'
metricResourceUri: appServicePlan.id
timeGrain: 'PT1M'
statistic: 'Average'
timeWindow: 'PT10M'
timeAggregation: 'Average'
operator: 'LessThan'
threshold: 30
}
scaleAction: {
direction: 'Decrease'
type: 'ChangeCount'
value: '1'
cooldown: 'PT10M'
}
}
]
}
]
}
}
Rate Limiting
Protect your Azure OpenAI quota and your users:
builder.Services.AddRateLimiter(options =>
{
// Per-user rate limit
options.AddPolicy("user-limit", context =>
{
var userId = context.User?.Identity?.Name ?? context.Connection.RemoteIpAddress?.ToString() ?? "anonymous";
return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 20,
TokensPerPeriod = 10,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
QueueLimit = 5
});
});
// Global rate limit to protect Azure OpenAI quota
options.AddPolicy("global-limit", _ =>
RateLimitPartition.GetFixedWindowLimiter("global", _ => new FixedWindowRateLimiterOptions
{
Window = TimeSpan.FromMinutes(1),
PermitLimit = 500 // Adjust based on your quota
}));
});
app.UseRateLimiter();
app.MapPost("/api/chat", ...)
.RequireRateLimiting("user-limit")
.RequireRateLimiting("global-limit");
Security Best Practices
Managed Identity for Azure Resources
Never store secrets in code or config:
// Use DefaultAzureCredential, which automatically uses Managed Identity in Azure
builder.Services.AddSingleton<IChatClient>(sp =>
{
var config = sp.GetRequiredService<IConfiguration>();
return new ChatCompletionsClient(
new Uri(config["AzureOpenAI:Endpoint"]!),
new DefaultAzureCredential() // Uses Managed Identity in Azure
).AsChatClient(config["AzureOpenAI:DeploymentName"]!);
});
Assign the Cognitive Services OpenAI User role to your App Service's managed identity:
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(openAIAccount.id, appService.id, cognitiveServicesOpenAIUser)
scope: openAIAccount
properties: {
roleDefinitionId: cognitiveServicesOpenAIUser
principalId: appService.identity.principalId
principalType: 'ServicePrincipal'
}
}
Input Validation
Never trust user input, even (especially) for AI:
app.MapPost("/api/chat", async (
ChatRequest request,
IAgentRuntime runtime,
CancellationToken ct) =>
{
// Validate input length
if (string.IsNullOrWhiteSpace(request.Message))
return Results.BadRequest("Message is required");
if (request.Message.Length > 10000)
return Results.BadRequest("Message too long");
// Sanitize session ID
if (request.SessionId != null && !Guid.TryParse(request.SessionId, out _))
return Results.BadRequest("Invalid session ID format");
// Validate agent name against allowlist
var allowedAgents = new[] { "CustomerSupport", "OrderLookup" };
if (request.AgentName != null && !allowedAgents.Contains(request.AgentName))
return Results.BadRequest("Unknown agent");
// Proceed with validated request
var session = await runtime.GetOrCreateSessionAsync(
request.SessionId ?? Guid.NewGuid().ToString(),
ct);
var response = await runtime.InvokeAgentAsync(
request.AgentName ?? "CustomerSupport",
request.Message,
session,
ct);
return Results.Ok(new ChatResponse(...));
});
Network Isolation
For sensitive workloads, use VNet integration:
resource vnet 'Microsoft.Network/virtualNetworks@2023-04-01' = {
name: vnetName
location: location
properties: {
addressSpace: {
addressPrefixes: ['10.0.0.0/16']
}
subnets: [
{
name: 'app-subnet'
properties: {
addressPrefix: '10.0.1.0/24'
delegations: [
{
name: 'appServiceDelegation'
properties: {
serviceName: 'Microsoft.Web/serverFarms'
}
}
]
}
}
{
name: 'openai-subnet'
properties: {
addressPrefix: '10.0.2.0/24'
privateEndpointNetworkPolicies: 'Disabled'
}
}
]
}
}
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
name: '${openAIAccountName}-pe'
location: location
properties: {
subnet: {
id: vnet.properties.subnets[1].id
}
privateLinkServiceConnections: [
{
name: 'openai-connection'
properties: {
privateLinkServiceId: openAIAccount.id
groupIds: ['account']
}
}
]
}
}
Cost Management
AI can get expensive fast. Build in guardrails.
Token Budget Per Session
public class BudgetEnforcementMiddleware : IAgentMiddleware
{
private readonly ISessionStore _sessionStore;
private readonly int _maxTokensPerSession;
public BudgetEnforcementMiddleware(
ISessionStore sessionStore,
IConfiguration config)
{
_sessionStore = sessionStore;
_maxTokensPerSession = config.GetValue<int>("Budget:MaxTokensPerSession", 50000);
}
public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
{
var session = request.Session;
if (session != null && session.TotalTokensUsed >= _maxTokensPerSession)
{
return new AgentResponse
{
Content = "I apologize, but this conversation has reached its limit. " +
"Please start a new conversation for further assistance.",
Status = AgentResponseStatus.BudgetExceeded
};
}
var response = await next(request);
// Update session token count
if (session != null && response.Usage != null)
{
session.TotalTokensUsed += response.Usage.TotalTokens;
await _sessionStore.SaveAsync(session);
}
return response;
}
}
Cost Tracking and Alerting
public class CostTrackingMiddleware : IAgentMiddleware
{
private static readonly Counter<decimal> CostCounter =
Meter.CreateCounter<decimal>("agent.cost.usd", "USD", "Estimated cost in USD");
// Pricing per 1K tokens (adjust for your model)
private const decimal InputCostPer1K = 0.01m;
private const decimal OutputCostPer1K = 0.03m;
public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
{
var response = await next(request);
if (response.Usage != null)
{
var inputCost = (response.Usage.InputTokens / 1000m) * InputCostPer1K;
var outputCost = (response.Usage.OutputTokens / 1000m) * OutputCostPer1K;
var totalCost = inputCost + outputCost;
CostCounter.Add(totalCost,
new KeyValuePair<string, object?>("agent", request.Agent.Name),
new KeyValuePair<string, object?>("model", request.Agent.ModelId));
}
return response;
}
}
Set up Azure Monitor alerts when costs exceed thresholds.
Deployment with Azure AI Foundry
Azure AI Foundry is Microsoft's unified platform for AI development and deployment. Here's a complete deployment setup:
Bicep Template
@description('Environment name')
param environment string = 'prod'
@description('Location for resources')
param location string = resourceGroup().location
var appName = 'agent-service-${environment}'
var openAIAccountName = 'openai-${environment}'
// Azure OpenAI
resource openAIAccount 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' = {
name: openAIAccountName
location: location
kind: 'OpenAI'
sku: {
name: 'S0'
}
properties: {
customSubDomainName: openAIAccountName
publicNetworkAccess: 'Disabled' // Use private endpoint
}
}
resource gpt4Deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-10-01-preview' = {
parent: openAIAccount
name: 'gpt-4o'
sku: {
name: 'Standard'
capacity: 40 // TPM in thousands
}
properties: {
model: {
format: 'OpenAI'
name: 'gpt-4o'
version: '2024-11-20'
}
}
}
// App Service
resource appServicePlan 'Microsoft.Web/serverfarms@2023-01-01' = {
name: '${appName}-plan'
location: location
sku: {
name: 'P1v3'
tier: 'PremiumV3'
capacity: 2
}
properties: {
reserved: true // Linux
}
}
resource appService 'Microsoft.Web/sites@2023-01-01' = {
name: appName
location: location
identity: {
type: 'SystemAssigned'
}
properties: {
serverFarmId: appServicePlan.id
siteConfig: {
linuxFxVersion: 'DOTNETCORE|8.0'
alwaysOn: true
healthCheckPath: '/health'
appSettings: [
{
name: 'AzureOpenAI__Endpoint'
value: openAIAccount.properties.endpoint
}
{
name: 'AzureOpenAI__DeploymentName'
value: gpt4Deployment.name
}
{
name: 'ApplicationInsights__ConnectionString'
value: appInsights.properties.ConnectionString
}
{
name: 'Storage__ConnectionString'
value: '@Microsoft.KeyVault(VaultName=${keyVault.name};SecretName=storage-connection)'
}
]
}
virtualNetworkSubnetId: appSubnet.id
}
}
// Application Insights
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
name: '${appName}-insights'
location: location
kind: 'web'
properties: {
Application_Type: 'web'
}
}
// Storage for sessions
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'agentstorage${uniqueString(resourceGroup().id)}'
location: location
sku: {
name: 'Standard_LRS'
}
kind: 'StorageV2'
}
// Outputs
output appUrl string = 'https://${appService.properties.defaultHostName}'
output openAIEndpoint string = openAIAccount.properties.endpoint
GitHub Actions Deployment
name: Deploy Agent Service
on:
push:
branches: [main]
workflow_dispatch:
env:
AZURE_WEBAPP_NAME: agent-service-prod
DOTNET_VERSION: '8.0.x'
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup .NET
uses: actions/setup-dotnet@v4
with:
dotnet-version: ${{ env.DOTNET_VERSION }}
- name: Restore dependencies
run: dotnet restore
- name: Build
run: dotnet build --configuration Release --no-restore
- name: Test
run: dotnet test --configuration Release --no-build
- name: Publish
run: dotnet publish -c Release -o ./publish
- name: Login to Azure
uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Deploy to Azure Web App
uses: azure/webapps-deploy@v2
with:
app-name: ${{ env.AZURE_WEBAPP_NAME }}
package: ./publish
Production Checklist
Before going live:
- [ ] Authentication: Managed Identity configured for all Azure resources
- [ ] Network: VNet integration and private endpoints for sensitive workloads
- [ ] Observability: OpenTelemetry tracing and metrics exporting to Azure Monitor
- [ ] Sessions: Persistent session storage configured
- [ ] Cost Controls: Token budgets and cost tracking enabled
- [ ] Rate Limiting: Per-user and global rate limits configured
- [ ] Health Checks:
/healthendpoint implemented and monitored - [ ] Graceful Shutdown: Proper handling of in-flight requests
- [ ] Autoscaling: Rules configured based on CPU and request patterns
- [ ] Alerts: Set up for latency, errors, and cost thresholds
- [ ] Logging: Structured logging with appropriate levels
- [ ] Secrets: All secrets in Key Vault, no hardcoded values
- [ ] CI/CD: Automated deployment pipeline with tests
Summary
Production deployment of Agent Framework applications requires attention to:
- Architecture: Proper ASP.NET Core integration with dependency injection
- Observability: OpenTelemetry for tracing, metrics, and debugging
- Durability: Persistent session storage with cleanup strategies
- Scaling: Horizontal scaling with distributed session stores
- Security: Managed Identity, input validation, network isolation
- Cost Management: Token budgets, tracking, and alerting
The patterns in this article work for any scale—from a single-instance proof of concept to a globally distributed multi-agent system.
Series Conclusion
Over these four parts, we've covered the complete journey:
- Part 1: The unification of Semantic Kernel and AutoGen into Agent Framework
- Part 2: Workflows for explicit multi-agent orchestration
- Part 3: MCP for universal tool interoperability
- Part 4: Production deployment with Azure AI Foundry
The .NET AI agent ecosystem in 2026 is mature, unified, and production-ready. The same patterns that make .NET excellent for enterprise development—type safety, dependency injection, observability—now apply to AI agents.
Build something great.
Questions? Drop them in the comments or find me on Twitter/X.
Top comments (0)