DEV Community

Cover image for Deploying Agent Framework to Production: Azure AI Foundry, Observability, and Scaling
Brian Spann
Brian Spann

Posted on

Deploying Agent Framework to Production: Azure AI Foundry, Observability, and Scaling

You've built agents. You've orchestrated workflows. You've integrated MCP tools. Now comes the part that separates demos from production systems: deployment.

In Part 3, we explored MCP and universal tool interoperability. This final part covers everything you need to run Agent Framework in production—Azure AI Foundry deployment, observability, session persistence, scaling, security, and cost management.

The Production Gap

Here's what usually happens: a developer builds an impressive agent demo, shows it to stakeholders, gets approval, and then... reality hits.

  • "How do we debug when something goes wrong?"
  • "Why did our Azure bill triple?"
  • "The agent forgot the conversation after the server restarted."
  • "It works, but it's slow under load."

Production AI systems require the same rigor as any production software—plus new considerations around token costs, model latency, and non-deterministic behavior.

Let's close that gap.

Agent Framework in ASP.NET Core

First, let's structure our agent application properly for production hosting.

Project Setup

dotnet new webapi -n AgentService
cd AgentService
dotnet add package Microsoft.Agents.AI --prerelease
dotnet add package Microsoft.Extensions.AI.AzureAIInference
dotnet add package Azure.Identity
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package Azure.Monitor.OpenTelemetry.Exporter
Enter fullscreen mode Exit fullscreen mode

Service Registration

// Program.cs
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Azure.AI.Inference;
using Azure.Identity;

var builder = WebApplication.CreateBuilder(args);

// Register the chat client (Azure OpenAI)
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();
    return new ChatCompletionsClient(
        new Uri(config["AzureOpenAI:Endpoint"]!),
        new DefaultAzureCredential()
    ).AsChatClient(config["AzureOpenAI:DeploymentName"]!);
});

// Register agents
builder.Services.AddSingleton<CustomerSupportAgent>();
builder.Services.AddSingleton<OrderLookupAgent>();
builder.Services.AddSingleton<EscalationAgent>();

// Register the agent runtime
builder.Services.AddAgentRuntime(options =>
{
    options.DefaultAgent = "CustomerSupport";
    options.SessionTimeout = TimeSpan.FromMinutes(30);
});

var app = builder.Build();
Enter fullscreen mode Exit fullscreen mode

Defining Production Agents

public class CustomerSupportAgent : ChatClientAgent
{
    private readonly IOrderService _orderService;
    private readonly ILogger<CustomerSupportAgent> _logger;

    public CustomerSupportAgent(
        IChatClient chatClient,
        IOrderService orderService,
        ILogger<CustomerSupportAgent> logger)
        : base(chatClient, new ChatClientAgentOptions
        {
            Name = "CustomerSupport",
            Instructions = """
                You are a customer support agent for an e-commerce company.

                Guidelines:
                - Always greet customers warmly
                - Use the order lookup tool for order-related queries
                - Escalate to human support if the customer is frustrated
                - Never share internal system details
                - Keep responses concise but helpful
                """
        })
    {
        _orderService = orderService;
        _logger = logger;

        AddTools(this);
    }

    [AgentTool, Description("Look up an order by order ID or customer email")]
    public async Task<OrderInfo> LookupOrder(
        [Description("Order ID (e.g., ORD-12345)")] string? orderId = null,
        [Description("Customer email address")] string? email = null)
    {
        _logger.LogInformation("Order lookup requested: {OrderId}, {Email}", orderId, email);

        if (orderId != null)
            return await _orderService.GetByIdAsync(orderId);

        if (email != null)
            return await _orderService.GetLatestByEmailAsync(email);

        throw new ArgumentException("Either orderId or email must be provided");
    }

    [AgentTool, Description("Escalate conversation to human support")]
    public EscalationResult EscalateToHuman(
        [Description("Reason for escalation")] string reason,
        [Description("Priority: low, medium, high")] string priority = "medium")
    {
        _logger.LogWarning("Escalation requested: {Reason} (Priority: {Priority})", reason, priority);

        return new EscalationResult
        {
            TicketId = $"ESC-{DateTime.UtcNow:yyyyMMddHHmmss}",
            EstimatedWaitTime = priority == "high" ? "5 minutes" : "15 minutes"
        };
    }
}
Enter fullscreen mode Exit fullscreen mode

Chat Endpoint

app.MapPost("/api/chat", async (
    ChatRequest request,
    IAgentRuntime runtime,
    CancellationToken ct) =>
{
    // Get or create session for conversation continuity
    var session = await runtime.GetOrCreateSessionAsync(
        request.SessionId ?? Guid.NewGuid().ToString(),
        ct);

    // Invoke the agent
    var response = await runtime.InvokeAgentAsync(
        request.AgentName ?? "CustomerSupport",
        request.Message,
        session,
        ct);

    return Results.Ok(new ChatResponse
    {
        SessionId = session.Id,
        Message = response.Content,
        AgentName = response.AgentName,
        TokensUsed = response.Usage?.TotalTokens
    });
});

// Streaming endpoint for real-time responses
app.MapPost("/api/chat/stream", async (
    ChatRequest request,
    IAgentRuntime runtime,
    HttpContext httpContext,
    CancellationToken ct) =>
{
    var session = await runtime.GetOrCreateSessionAsync(
        request.SessionId ?? Guid.NewGuid().ToString(),
        ct);

    httpContext.Response.ContentType = "text/event-stream";

    await foreach (var chunk in runtime.InvokeAgentStreamingAsync(
        request.AgentName ?? "CustomerSupport",
        request.Message,
        session,
        ct))
    {
        await httpContext.Response.WriteAsync($"data: {chunk.Content}\n\n", ct);
        await httpContext.Response.Body.FlushAsync(ct);
    }
});

public record ChatRequest(string Message, string? SessionId = null, string? AgentName = null);
public record ChatResponse(string SessionId, string Message, string AgentName, int? TokensUsed);
Enter fullscreen mode Exit fullscreen mode

Observability with OpenTelemetry

Production systems need visibility. When an agent gives a wrong answer or takes too long, you need to understand why.

Configuring OpenTelemetry

// Program.cs
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService("AgentService", serviceVersion: "1.0.0"))
    .WithTracing(tracing => tracing
        .AddSource("Microsoft.Agents.AI")
        .AddSource("AgentService.Telemetry")
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddAzureMonitorTraceExporter(options =>
        {
            options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
        }))
    .WithMetrics(metrics => metrics
        .AddMeter("Microsoft.Agents.AI")
        .AddMeter("AgentService.Metrics")
        .AddAspNetCoreInstrumentation()
        .AddAzureMonitorMetricExporter(options =>
        {
            options.ConnectionString = builder.Configuration["ApplicationInsights:ConnectionString"];
        }));
Enter fullscreen mode Exit fullscreen mode

Custom Agent Telemetry Middleware

public class AgentTelemetryMiddleware : IAgentMiddleware
{
    private static readonly ActivitySource ActivitySource = new("AgentService.Telemetry");
    private static readonly Meter Meter = new("AgentService.Metrics");

    private static readonly Counter<long> InvocationCounter = 
        Meter.CreateCounter<long>("agent.invocations", description: "Total agent invocations");
    private static readonly Histogram<double> LatencyHistogram = 
        Meter.CreateHistogram<double>("agent.latency", "ms", "Agent response latency");
    private static readonly Counter<long> TokenCounter = 
        Meter.CreateCounter<long>("agent.tokens", description: "Total tokens used");
    private static readonly Counter<long> ErrorCounter = 
        Meter.CreateCounter<long>("agent.errors", description: "Agent errors");

    private readonly ILogger<AgentTelemetryMiddleware> _logger;

    public AgentTelemetryMiddleware(ILogger<AgentTelemetryMiddleware> logger)
    {
        _logger = logger;
    }

    public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
    {
        using var activity = ActivitySource.StartActivity("agent.invoke", ActivityKind.Internal);

        var agentName = request.Agent.Name;
        var sessionId = request.Session?.Id ?? "none";

        activity?.SetTag("agent.name", agentName);
        activity?.SetTag("session.id", sessionId);
        activity?.SetTag("message.length", request.Message.Length);

        InvocationCounter.Add(1, new KeyValuePair<string, object?>("agent", agentName));

        var sw = Stopwatch.StartNew();

        try
        {
            var response = await next(request);

            sw.Stop();

            // Record success metrics
            activity?.SetTag("response.status", "success");
            activity?.SetTag("tokens.prompt", response.Usage?.InputTokens);
            activity?.SetTag("tokens.completion", response.Usage?.OutputTokens);
            activity?.SetTag("tokens.total", response.Usage?.TotalTokens);
            activity?.SetTag("duration.ms", sw.ElapsedMilliseconds);

            LatencyHistogram.Record(sw.ElapsedMilliseconds, 
                new KeyValuePair<string, object?>("agent", agentName),
                new KeyValuePair<string, object?>("status", "success"));

            if (response.Usage?.TotalTokens > 0)
            {
                TokenCounter.Add(response.Usage.TotalTokens,
                    new KeyValuePair<string, object?>("agent", agentName));
            }

            _logger.LogInformation(
                "Agent {Agent} responded in {Duration}ms using {Tokens} tokens",
                agentName, sw.ElapsedMilliseconds, response.Usage?.TotalTokens);

            return response;
        }
        catch (Exception ex)
        {
            sw.Stop();

            activity?.SetTag("response.status", "error");
            activity?.SetTag("error.type", ex.GetType().Name);
            activity?.SetTag("error.message", ex.Message);

            ErrorCounter.Add(1,
                new KeyValuePair<string, object?>("agent", agentName),
                new KeyValuePair<string, object?>("error.type", ex.GetType().Name));

            LatencyHistogram.Record(sw.ElapsedMilliseconds,
                new KeyValuePair<string, object?>("agent", agentName),
                new KeyValuePair<string, object?>("status", "error"));

            _logger.LogError(ex, 
                "Agent {Agent} failed after {Duration}ms: {Error}",
                agentName, sw.ElapsedMilliseconds, ex.Message);

            throw;
        }
    }
}

// Register the middleware
builder.Services.AddAgentRuntime(options =>
{
    options.AddMiddleware<AgentTelemetryMiddleware>();
});
Enter fullscreen mode Exit fullscreen mode

What to Monitor

Create an Azure Monitor dashboard tracking:

Metric Alert Threshold Why
agent.latency p95 > 10 seconds User experience degradation
agent.errors rate > 5% Quality issues
agent.tokens per session > 50,000 Cost control
agent.invocations rate Context-dependent Capacity planning

Structured Logging for Debugging

// Log the full conversation for debugging
_logger.LogDebug("Agent conversation: {@Conversation}", new
{
    SessionId = session.Id,
    AgentName = agent.Name,
    MessageHistory = session.Messages.Select(m => new
    {
        m.Role,
        ContentPreview = m.Content?.Substring(0, Math.Min(100, m.Content.Length ?? 0)),
        m.Timestamp
    }),
    ToolCalls = response.ToolCalls?.Select(t => new { t.Name, t.Arguments })
});
Enter fullscreen mode Exit fullscreen mode

Session Persistence

In-memory sessions vanish when the server restarts. For production, persist them.

Azure Table Storage Session Store

public class AzureTableSessionStore : ISessionStore
{
    private readonly TableClient _tableClient;
    private readonly ILogger<AzureTableSessionStore> _logger;

    public AzureTableSessionStore(
        TableServiceClient tableServiceClient,
        ILogger<AzureTableSessionStore> logger)
    {
        _tableClient = tableServiceClient.GetTableClient("agentsessions");
        _tableClient.CreateIfNotExists();
        _logger = logger;
    }

    public async Task<AgentSession?> GetAsync(string sessionId, CancellationToken ct = default)
    {
        try
        {
            var response = await _tableClient.GetEntityAsync<SessionEntity>(
                partitionKey: "sessions",
                rowKey: sessionId,
                cancellationToken: ct);

            return Deserialize(response.Value);
        }
        catch (RequestFailedException ex) when (ex.Status == 404)
        {
            return null;
        }
    }

    public async Task SaveAsync(AgentSession session, CancellationToken ct = default)
    {
        var entity = new SessionEntity
        {
            PartitionKey = "sessions",
            RowKey = session.Id,
            Data = JsonSerializer.Serialize(session),
            LastModified = DateTimeOffset.UtcNow,
            MessageCount = session.Messages.Count,
            TotalTokens = session.TotalTokensUsed
        };

        await _tableClient.UpsertEntityAsync(entity, cancellationToken: ct);

        _logger.LogDebug("Saved session {SessionId} with {MessageCount} messages",
            session.Id, session.Messages.Count);
    }

    public async Task DeleteAsync(string sessionId, CancellationToken ct = default)
    {
        await _tableClient.DeleteEntityAsync("sessions", sessionId, cancellationToken: ct);
    }

    private AgentSession Deserialize(SessionEntity entity)
    {
        return JsonSerializer.Deserialize<AgentSession>(entity.Data)!;
    }
}

public class SessionEntity : ITableEntity
{
    public string PartitionKey { get; set; } = default!;
    public string RowKey { get; set; } = default!;
    public DateTimeOffset? Timestamp { get; set; }
    public ETag ETag { get; set; }

    public string Data { get; set; } = default!;
    public DateTimeOffset LastModified { get; set; }
    public int MessageCount { get; set; }
    public int TotalTokens { get; set; }
}

// Registration
builder.Services.AddSingleton<TableServiceClient>(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();
    return new TableServiceClient(config.GetConnectionString("Storage"));
});

builder.Services.AddSingleton<ISessionStore, AzureTableSessionStore>();
Enter fullscreen mode Exit fullscreen mode

Session Cleanup Job

Sessions accumulate. Clean up old ones:

public class SessionCleanupJob : BackgroundService
{
    private readonly ISessionStore _sessionStore;
    private readonly ILogger<SessionCleanupJob> _logger;
    private readonly TimeSpan _maxAge = TimeSpan.FromDays(7);
    private readonly TimeSpan _interval = TimeSpan.FromHours(1);

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                var cutoff = DateTimeOffset.UtcNow - _maxAge;
                var deleted = await _sessionStore.DeleteOlderThanAsync(cutoff, stoppingToken);

                _logger.LogInformation("Cleaned up {Count} expired sessions", deleted);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Session cleanup failed");
            }

            await Task.Delay(_interval, stoppingToken);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Scaling Strategies

Horizontal Scaling

Agent Framework applications scale horizontally like any ASP.NET Core app, with one consideration: session affinity.

If using in-memory sessions, enable sticky sessions:

// Azure App Service configuration
{
    "siteConfig": {
        "clientAffinityEnabled": true
    }
}
Enter fullscreen mode Exit fullscreen mode

Better approach—use distributed session storage (shown above) and scale freely:

// No affinity needed with external session store
{
    "siteConfig": {
        "clientAffinityEnabled": false
    }
}
Enter fullscreen mode Exit fullscreen mode

Load Patterns

AI agents have bursty, unpredictable workloads. Configure autoscaling accordingly:

resource appServicePlan 'Microsoft.Web/serverfarms@2023-01-01' = {
  name: planName
  location: location
  sku: {
    name: 'P1v3'
    tier: 'PremiumV3'
    capacity: 2  // Minimum instances
  }
}

resource autoScaleSettings 'Microsoft.Insights/autoscalesettings@2022-10-01' = {
  name: '${appName}-autoscale'
  location: location
  properties: {
    targetResourceUri: appServicePlan.id
    enabled: true
    profiles: [
      {
        name: 'Auto scale based on CPU and requests'
        capacity: {
          minimum: '2'
          maximum: '10'
          default: '2'
        }
        rules: [
          {
            metricTrigger: {
              metricName: 'CpuPercentage'
              metricResourceUri: appServicePlan.id
              timeGrain: 'PT1M'
              statistic: 'Average'
              timeWindow: 'PT5M'
              timeAggregation: 'Average'
              operator: 'GreaterThan'
              threshold: 70
            }
            scaleAction: {
              direction: 'Increase'
              type: 'ChangeCount'
              value: '2'
              cooldown: 'PT5M'
            }
          }
          {
            metricTrigger: {
              metricName: 'CpuPercentage'
              metricResourceUri: appServicePlan.id
              timeGrain: 'PT1M'
              statistic: 'Average'
              timeWindow: 'PT10M'
              timeAggregation: 'Average'
              operator: 'LessThan'
              threshold: 30
            }
            scaleAction: {
              direction: 'Decrease'
              type: 'ChangeCount'
              value: '1'
              cooldown: 'PT10M'
            }
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

Protect your Azure OpenAI quota and your users:

builder.Services.AddRateLimiter(options =>
{
    // Per-user rate limit
    options.AddPolicy("user-limit", context =>
    {
        var userId = context.User?.Identity?.Name ?? context.Connection.RemoteIpAddress?.ToString() ?? "anonymous";

        return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new TokenBucketRateLimiterOptions
        {
            TokenLimit = 20,
            TokensPerPeriod = 10,
            ReplenishmentPeriod = TimeSpan.FromMinutes(1),
            QueueLimit = 5
        });
    });

    // Global rate limit to protect Azure OpenAI quota
    options.AddPolicy("global-limit", _ =>
        RateLimitPartition.GetFixedWindowLimiter("global", _ => new FixedWindowRateLimiterOptions
        {
            Window = TimeSpan.FromMinutes(1),
            PermitLimit = 500  // Adjust based on your quota
        }));
});

app.UseRateLimiter();

app.MapPost("/api/chat", ...)
    .RequireRateLimiting("user-limit")
    .RequireRateLimiting("global-limit");
Enter fullscreen mode Exit fullscreen mode

Security Best Practices

Managed Identity for Azure Resources

Never store secrets in code or config:

// Use DefaultAzureCredential, which automatically uses Managed Identity in Azure
builder.Services.AddSingleton<IChatClient>(sp =>
{
    var config = sp.GetRequiredService<IConfiguration>();
    return new ChatCompletionsClient(
        new Uri(config["AzureOpenAI:Endpoint"]!),
        new DefaultAzureCredential()  // Uses Managed Identity in Azure
    ).AsChatClient(config["AzureOpenAI:DeploymentName"]!);
});
Enter fullscreen mode Exit fullscreen mode

Assign the Cognitive Services OpenAI User role to your App Service's managed identity:

resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(openAIAccount.id, appService.id, cognitiveServicesOpenAIUser)
  scope: openAIAccount
  properties: {
    roleDefinitionId: cognitiveServicesOpenAIUser
    principalId: appService.identity.principalId
    principalType: 'ServicePrincipal'
  }
}
Enter fullscreen mode Exit fullscreen mode

Input Validation

Never trust user input, even (especially) for AI:

app.MapPost("/api/chat", async (
    ChatRequest request,
    IAgentRuntime runtime,
    CancellationToken ct) =>
{
    // Validate input length
    if (string.IsNullOrWhiteSpace(request.Message))
        return Results.BadRequest("Message is required");

    if (request.Message.Length > 10000)
        return Results.BadRequest("Message too long");

    // Sanitize session ID
    if (request.SessionId != null && !Guid.TryParse(request.SessionId, out _))
        return Results.BadRequest("Invalid session ID format");

    // Validate agent name against allowlist
    var allowedAgents = new[] { "CustomerSupport", "OrderLookup" };
    if (request.AgentName != null && !allowedAgents.Contains(request.AgentName))
        return Results.BadRequest("Unknown agent");

    // Proceed with validated request
    var session = await runtime.GetOrCreateSessionAsync(
        request.SessionId ?? Guid.NewGuid().ToString(),
        ct);

    var response = await runtime.InvokeAgentAsync(
        request.AgentName ?? "CustomerSupport",
        request.Message,
        session,
        ct);

    return Results.Ok(new ChatResponse(...));
});
Enter fullscreen mode Exit fullscreen mode

Network Isolation

For sensitive workloads, use VNet integration:

resource vnet 'Microsoft.Network/virtualNetworks@2023-04-01' = {
  name: vnetName
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: ['10.0.0.0/16']
    }
    subnets: [
      {
        name: 'app-subnet'
        properties: {
          addressPrefix: '10.0.1.0/24'
          delegations: [
            {
              name: 'appServiceDelegation'
              properties: {
                serviceName: 'Microsoft.Web/serverFarms'
              }
            }
          ]
        }
      }
      {
        name: 'openai-subnet'
        properties: {
          addressPrefix: '10.0.2.0/24'
          privateEndpointNetworkPolicies: 'Disabled'
        }
      }
    ]
  }
}

resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: '${openAIAccountName}-pe'
  location: location
  properties: {
    subnet: {
      id: vnet.properties.subnets[1].id
    }
    privateLinkServiceConnections: [
      {
        name: 'openai-connection'
        properties: {
          privateLinkServiceId: openAIAccount.id
          groupIds: ['account']
        }
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Cost Management

AI can get expensive fast. Build in guardrails.

Token Budget Per Session

public class BudgetEnforcementMiddleware : IAgentMiddleware
{
    private readonly ISessionStore _sessionStore;
    private readonly int _maxTokensPerSession;

    public BudgetEnforcementMiddleware(
        ISessionStore sessionStore,
        IConfiguration config)
    {
        _sessionStore = sessionStore;
        _maxTokensPerSession = config.GetValue<int>("Budget:MaxTokensPerSession", 50000);
    }

    public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
    {
        var session = request.Session;

        if (session != null && session.TotalTokensUsed >= _maxTokensPerSession)
        {
            return new AgentResponse
            {
                Content = "I apologize, but this conversation has reached its limit. " +
                          "Please start a new conversation for further assistance.",
                Status = AgentResponseStatus.BudgetExceeded
            };
        }

        var response = await next(request);

        // Update session token count
        if (session != null && response.Usage != null)
        {
            session.TotalTokensUsed += response.Usage.TotalTokens;
            await _sessionStore.SaveAsync(session);
        }

        return response;
    }
}
Enter fullscreen mode Exit fullscreen mode

Cost Tracking and Alerting

public class CostTrackingMiddleware : IAgentMiddleware
{
    private static readonly Counter<decimal> CostCounter =
        Meter.CreateCounter<decimal>("agent.cost.usd", "USD", "Estimated cost in USD");

    // Pricing per 1K tokens (adjust for your model)
    private const decimal InputCostPer1K = 0.01m;
    private const decimal OutputCostPer1K = 0.03m;

    public async Task<AgentResponse> InvokeAsync(AgentRequest request, AgentDelegate next)
    {
        var response = await next(request);

        if (response.Usage != null)
        {
            var inputCost = (response.Usage.InputTokens / 1000m) * InputCostPer1K;
            var outputCost = (response.Usage.OutputTokens / 1000m) * OutputCostPer1K;
            var totalCost = inputCost + outputCost;

            CostCounter.Add(totalCost,
                new KeyValuePair<string, object?>("agent", request.Agent.Name),
                new KeyValuePair<string, object?>("model", request.Agent.ModelId));
        }

        return response;
    }
}
Enter fullscreen mode Exit fullscreen mode

Set up Azure Monitor alerts when costs exceed thresholds.

Deployment with Azure AI Foundry

Azure AI Foundry is Microsoft's unified platform for AI development and deployment. Here's a complete deployment setup:

Bicep Template

@description('Environment name')
param environment string = 'prod'

@description('Location for resources')
param location string = resourceGroup().location

var appName = 'agent-service-${environment}'
var openAIAccountName = 'openai-${environment}'

// Azure OpenAI
resource openAIAccount 'Microsoft.CognitiveServices/accounts@2023-10-01-preview' = {
  name: openAIAccountName
  location: location
  kind: 'OpenAI'
  sku: {
    name: 'S0'
  }
  properties: {
    customSubDomainName: openAIAccountName
    publicNetworkAccess: 'Disabled'  // Use private endpoint
  }
}

resource gpt4Deployment 'Microsoft.CognitiveServices/accounts/deployments@2023-10-01-preview' = {
  parent: openAIAccount
  name: 'gpt-4o'
  sku: {
    name: 'Standard'
    capacity: 40  // TPM in thousands
  }
  properties: {
    model: {
      format: 'OpenAI'
      name: 'gpt-4o'
      version: '2024-11-20'
    }
  }
}

// App Service
resource appServicePlan 'Microsoft.Web/serverfarms@2023-01-01' = {
  name: '${appName}-plan'
  location: location
  sku: {
    name: 'P1v3'
    tier: 'PremiumV3'
    capacity: 2
  }
  properties: {
    reserved: true  // Linux
  }
}

resource appService 'Microsoft.Web/sites@2023-01-01' = {
  name: appName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    serverFarmId: appServicePlan.id
    siteConfig: {
      linuxFxVersion: 'DOTNETCORE|8.0'
      alwaysOn: true
      healthCheckPath: '/health'
      appSettings: [
        {
          name: 'AzureOpenAI__Endpoint'
          value: openAIAccount.properties.endpoint
        }
        {
          name: 'AzureOpenAI__DeploymentName'
          value: gpt4Deployment.name
        }
        {
          name: 'ApplicationInsights__ConnectionString'
          value: appInsights.properties.ConnectionString
        }
        {
          name: 'Storage__ConnectionString'
          value: '@Microsoft.KeyVault(VaultName=${keyVault.name};SecretName=storage-connection)'
        }
      ]
    }
    virtualNetworkSubnetId: appSubnet.id
  }
}

// Application Insights
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: '${appName}-insights'
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
  }
}

// Storage for sessions
resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
  name: 'agentstorage${uniqueString(resourceGroup().id)}'
  location: location
  sku: {
    name: 'Standard_LRS'
  }
  kind: 'StorageV2'
}

// Outputs
output appUrl string = 'https://${appService.properties.defaultHostName}'
output openAIEndpoint string = openAIAccount.properties.endpoint
Enter fullscreen mode Exit fullscreen mode

GitHub Actions Deployment

name: Deploy Agent Service

on:
  push:
    branches: [main]
  workflow_dispatch:

env:
  AZURE_WEBAPP_NAME: agent-service-prod
  DOTNET_VERSION: '8.0.x'

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Setup .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: ${{ env.DOTNET_VERSION }}

    - name: Restore dependencies
      run: dotnet restore

    - name: Build
      run: dotnet build --configuration Release --no-restore

    - name: Test
      run: dotnet test --configuration Release --no-build

    - name: Publish
      run: dotnet publish -c Release -o ./publish

    - name: Login to Azure
      uses: azure/login@v1
      with:
        creds: ${{ secrets.AZURE_CREDENTIALS }}

    - name: Deploy to Azure Web App
      uses: azure/webapps-deploy@v2
      with:
        app-name: ${{ env.AZURE_WEBAPP_NAME }}
        package: ./publish
Enter fullscreen mode Exit fullscreen mode

Production Checklist

Before going live:

  • [ ] Authentication: Managed Identity configured for all Azure resources
  • [ ] Network: VNet integration and private endpoints for sensitive workloads
  • [ ] Observability: OpenTelemetry tracing and metrics exporting to Azure Monitor
  • [ ] Sessions: Persistent session storage configured
  • [ ] Cost Controls: Token budgets and cost tracking enabled
  • [ ] Rate Limiting: Per-user and global rate limits configured
  • [ ] Health Checks: /health endpoint implemented and monitored
  • [ ] Graceful Shutdown: Proper handling of in-flight requests
  • [ ] Autoscaling: Rules configured based on CPU and request patterns
  • [ ] Alerts: Set up for latency, errors, and cost thresholds
  • [ ] Logging: Structured logging with appropriate levels
  • [ ] Secrets: All secrets in Key Vault, no hardcoded values
  • [ ] CI/CD: Automated deployment pipeline with tests

Summary

Production deployment of Agent Framework applications requires attention to:

  1. Architecture: Proper ASP.NET Core integration with dependency injection
  2. Observability: OpenTelemetry for tracing, metrics, and debugging
  3. Durability: Persistent session storage with cleanup strategies
  4. Scaling: Horizontal scaling with distributed session stores
  5. Security: Managed Identity, input validation, network isolation
  6. Cost Management: Token budgets, tracking, and alerting

The patterns in this article work for any scale—from a single-instance proof of concept to a globally distributed multi-agent system.

Series Conclusion

Over these four parts, we've covered the complete journey:

  1. Part 1: The unification of Semantic Kernel and AutoGen into Agent Framework
  2. Part 2: Workflows for explicit multi-agent orchestration
  3. Part 3: MCP for universal tool interoperability
  4. Part 4: Production deployment with Azure AI Foundry

The .NET AI agent ecosystem in 2026 is mature, unified, and production-ready. The same patterns that make .NET excellent for enterprise development—type safety, dependency injection, observability—now apply to AI agents.

Build something great.


Questions? Drop them in the comments or find me on Twitter/X.

Top comments (0)