DEV Community

Cover image for Azure AI Foundry Diagnostic Logging with Terraform: Every AI Call Tracked for Compliance πŸ“‹
Suhas Mallesh
Suhas Mallesh

Posted on

Azure AI Foundry Diagnostic Logging with Terraform: Every AI Call Tracked for Compliance πŸ“‹

Azure doesn't send AI diagnostic logs anywhere by default. One Terraform resource changes that - azurerm_monitor_diagnostic_setting routes audit logs, request/response data, and metrics to Log Analytics and Storage.

You've deployed your Azure AI Foundry endpoint (Post 1) and added content safety policies (Post 2). Your models are serving responses in production. Then your compliance team asks:

"Can you prove who called which model, when, and how long each request took?"

Azure AI services emit three categories of diagnostic logs - Audit, RequestResponse, and Trace. But they don't go anywhere by default. Without a diagnostic setting, every API call vanishes into the void. One Terraform resource fixes this: azurerm_monitor_diagnostic_setting routes those logs to Log Analytics for real-time queries and Storage for long-term compliance retention. 🎯

🧱 What Gets Logged

Azure Cognitive Services (the resource backing AI Foundry) emits three log categories:

Category What It Captures Compliance Value
Audit Key access events (ListKeys operations) Security audit trail
RequestResponse Every API call - model, operation, duration, caller IP, status code Usage tracking, performance monitoring
Trace Internal service trace data Debugging

The RequestResponse category is the workhorse. Each entry includes the operation name (completions, chat, embeddings), duration in milliseconds, HTTP status code, and partial caller IP. As of late 2024, Entra ID object IDs are also included when using AAD authentication instead of API keys.

Important limitation: RequestResponse logs capture metadata about calls (which model, how long, success/failure) but do not include the actual prompt or response content. For full prompt/response capture, you need Azure API Management (APIM) in front of your endpoint or application-level logging.

πŸ—οΈ Step 1: Log Analytics Workspace

Log Analytics is where you'll run real-time KQL queries against your AI logs:

# logging/log_analytics.tf

resource "azurerm_log_analytics_workspace" "ai_logs" {
  name                = "${var.environment}-ai-foundry-logs"
  location            = var.location
  resource_group_name = var.resource_group_name
  sku                 = "PerGB2018"
  retention_in_days   = var.log_analytics_retention_days

  tags = {
    Environment = var.environment
    Purpose     = "ai-diagnostic-logging"
  }
}
Enter fullscreen mode Exit fullscreen mode

πŸ“¦ Step 2: Storage Account for Long-Term Retention

For compliance retention beyond Log Analytics limits (up to 730 days), archive to a Storage Account with lifecycle management:

# logging/storage.tf

resource "azurerm_storage_account" "ai_logs" {
  name                     = "${var.environment}ailogstore"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = var.environment == "prod" ? "GRS" : "LRS"
  min_tls_version          = "TLS1_2"

  tags = {
    Environment = var.environment
    Purpose     = "ai-diagnostic-logging"
  }
}

resource "azurerm_storage_management_policy" "ai_logs" {
  storage_account_id = azurerm_storage_account.ai_logs.id

  rule {
    name    = "archive-old-logs"
    enabled = true

    filters {
      blob_types = ["blockBlob"]
    }

    actions {
      base_blob {
        tier_to_cool_after_days_since_modification_greater_than    = var.cool_tier_days
        tier_to_archive_after_days_since_modification_greater_than = var.archive_tier_days
        delete_after_days_since_modification_greater_than          = var.storage_retention_days
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

βš™οΈ Step 3: The Diagnostic Setting

This is the core resource. It attaches to your Cognitive Services account and routes all three log categories plus metrics to both destinations:

# logging/diagnostic_setting.tf

resource "azurerm_monitor_diagnostic_setting" "ai_foundry" {
  name                       = "${var.environment}-ai-foundry-diagnostics"
  target_resource_id         = var.cognitive_account_id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.ai_logs.id
  storage_account_id         = azurerm_storage_account.ai_logs.id

  enabled_log {
    category = "Audit"
  }

  enabled_log {
    category = "RequestResponse"
  }

  enabled_log {
    category = "Trace"
  }

  metric {
    category = "AllMetrics"
  }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Every API call to your AI Foundry endpoint now flows to both Log Analytics and Storage. No IAM roles to configure, no bucket policies - Azure handles the plumbing internally.

Key detail: The target_resource_id is your azurerm_cognitive_account resource ID from Post 1. If you have multiple cognitive accounts (e.g., separate ones per environment), each needs its own diagnostic setting.

πŸ”§ Step 4: Variables

# logging/variables.tf

variable "environment" { type = string }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "cognitive_account_id" { type = string }

variable "log_analytics_retention_days" {
  type    = number
  default = 30
}

variable "cool_tier_days" {
  type    = number
  default = 30
}

variable "archive_tier_days" {
  type    = number
  default = 90
}

variable "storage_retention_days" {
  type    = number
  default = 365
}
Enter fullscreen mode Exit fullscreen mode

Per-environment configs:

# environments/dev.tfvars
log_analytics_retention_days = 30
cool_tier_days               = 14
archive_tier_days            = 30
storage_retention_days       = 90

# environments/prod.tfvars
log_analytics_retention_days = 90
cool_tier_days               = 90
archive_tier_days            = 365
storage_retention_days       = 2555  # 7 years for regulated industries
Enter fullscreen mode Exit fullscreen mode

πŸ” Step 5: Query Your Logs

Once diagnostic settings are active, query your AI logs using KQL in Log Analytics:

// All AI API calls in the last 24 hours
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where TimeGenerated > ago(24h)
| project TimeGenerated, OperationName, DurationMs, ResultType
| order by TimeGenerated desc

// Average response time by operation
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize avg(DurationMs), count() by OperationName
| order by count_ desc

// Error rate over time (5-minute buckets)
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize
    total = count(),
    errors = countif(ResultType != "Success")
    by bin(TimeGenerated, 5m)
| extend error_rate = round(errors * 100.0 / total, 2)
| order by TimeGenerated desc

// Request volume by hour (capacity planning)
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize count() by bin(TimeGenerated, 1h), OperationName
| render timechart
Enter fullscreen mode Exit fullscreen mode

🚨 Step 6: Alerting

Set up alerts for error spikes and latency degradation:

# logging/alerts.tf

resource "azurerm_monitor_metric_alert" "ai_latency" {
  name                = "${var.environment}-ai-high-latency"
  resource_group_name = var.resource_group_name
  scopes              = [var.cognitive_account_id]
  description         = "AI Foundry response latency exceeds threshold"
  severity            = 2
  frequency           = "PT5M"
  window_size         = "PT15M"

  criteria {
    metric_namespace = "Microsoft.CognitiveServices/accounts"
    metric_name      = "Latency"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 5000
  }

  action {
    action_group_id = var.action_group_id
  }
}

resource "azurerm_monitor_metric_alert" "ai_errors" {
  name                = "${var.environment}-ai-high-error-rate"
  resource_group_name = var.resource_group_name
  scopes              = [var.cognitive_account_id]
  description         = "AI Foundry error rate spike"
  severity            = 1
  frequency           = "PT1M"
  window_size         = "PT5M"

  criteria {
    metric_namespace = "Microsoft.CognitiveServices/accounts"
    metric_name      = "ClientErrors"
    aggregation      = "Total"
    operator         = "GreaterThan"
    threshold        = 50
  }

  action {
    action_group_id = var.action_group_id
  }
}
Enter fullscreen mode Exit fullscreen mode

πŸ“ Production Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Azure AI Foundry API Call       β”‚
β”‚  (Chat / Completions / Embed)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
    Diagnostic Setting
    (azurerm_monitor_diagnostic_setting)
                β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚           β”‚           β”‚
    β–Ό           β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Log    β”‚ β”‚Storage β”‚ β”‚ Metric   β”‚
β”‚Analyt. β”‚ β”‚Account β”‚ β”‚ Alerts   β”‚
β”‚        β”‚ β”‚        β”‚ β”‚          β”‚
β”‚ KQL    β”‚ β”‚ Long   β”‚ β”‚ Latency  β”‚
β”‚ query  β”‚ β”‚ term   β”‚ β”‚ & error  β”‚
β”‚ & dash β”‚ β”‚ archiveβ”‚ β”‚ alerts   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Dual-destination pattern: Log Analytics for real-time queries and dashboards (shorter retention, KQL access). Storage Account for compliance retention (lifecycle to Cool/Archive tiers, years of data).

πŸ’‘ Tri-Cloud Comparison: Logging Architecture

Aspect AWS (Bedrock) GCP (Vertex AI) Azure (AI Foundry)
Core Terraform resource aws_bedrock_model_invocation_logging_configuration google_project_iam_audit_config + log sinks azurerm_monitor_diagnostic_setting
Prompt/response bodies Included in logs (inline) Separate BigQuery logging (API config) Not in diagnostic logs (needs APIM)
Real-time query engine CloudWatch Insights BigQuery SQL KQL (Log Analytics)
Long-term storage S3 + Glacier lifecycle GCS + Nearline/Coldline lifecycle Storage Account + Cool/Archive lifecycle
Scope Per-region, per-account Per-project, per-service Per-resource
Log categories Single config (text/image/embedding toggles) Three audit log types (Admin/Data Read/Data Write) Three categories (Audit/RequestResponse/Trace)

The biggest Azure difference: diagnostic settings are a generic Azure Monitor pattern that works identically across all Azure services, not an AI-specific feature. The same azurerm_monitor_diagnostic_setting you'd use for a SQL database or Key Vault works for AI Foundry. This makes it familiar if you're already on Azure, but it also means the AI-specific logging depth (like full prompt/response capture) requires additional architecture.

⏭️ What's Next

This is Post 3 of the Azure AI Infrastructure with Terraform series.


Every AI Foundry call now has a paper trail. Operation names, durations, status codes, and caller identity - all flowing to Log Analytics and Storage, all managed by Terraform, all queryable with KQL. πŸ“‹

Found this helpful? Follow for the full Azure AI Infrastructure with Terraform series! πŸ’¬

Top comments (0)