Suhas Mallesh

Posted on Feb 24

Azure AI Foundry Diagnostic Logging with Terraform: Every AI Call Tracked for Compliance 📋

#azure #terraform #devops #ai

Azure doesn't send AI diagnostic logs anywhere by default. One Terraform resource changes that - azurerm_monitor_diagnostic_setting routes audit logs, request/response data, and metrics to Log Analytics and Storage.

You've deployed your Azure AI Foundry endpoint (Post 1) and added content safety policies (Post 2). Your models are serving responses in production. Then your compliance team asks:

"Can you prove who called which model, when, and how long each request took?"

Azure AI services emit three categories of diagnostic logs - Audit, RequestResponse, and Trace. But they don't go anywhere by default. Without a diagnostic setting, every API call vanishes into the void. One Terraform resource fixes this: azurerm_monitor_diagnostic_setting routes those logs to Log Analytics for real-time queries and Storage for long-term compliance retention. 🎯

🧱 What Gets Logged

Azure Cognitive Services (the resource backing AI Foundry) emits three log categories:

Category	What It Captures	Compliance Value
Audit	Key access events (ListKeys operations)	Security audit trail
RequestResponse	Every API call - model, operation, duration, caller IP, status code	Usage tracking, performance monitoring
Trace	Internal service trace data	Debugging

The RequestResponse category is the workhorse. Each entry includes the operation name (completions, chat, embeddings), duration in milliseconds, HTTP status code, and partial caller IP. As of late 2024, Entra ID object IDs are also included when using AAD authentication instead of API keys.

Important limitation: RequestResponse logs capture metadata about calls (which model, how long, success/failure) but do not include the actual prompt or response content. For full prompt/response capture, you need Azure API Management (APIM) in front of your endpoint or application-level logging.

🏗️ Step 1: Log Analytics Workspace

Log Analytics is where you'll run real-time KQL queries against your AI logs:

# logging/log_analytics.tf

resource "azurerm_log_analytics_workspace" "ai_logs" {
  name                = "${var.environment}-ai-foundry-logs"
  location            = var.location
  resource_group_name = var.resource_group_name
  sku                 = "PerGB2018"
  retention_in_days   = var.log_analytics_retention_days

  tags = {
    Environment = var.environment
    Purpose     = "ai-diagnostic-logging"
  }
}

📦 Step 2: Storage Account for Long-Term Retention

For compliance retention beyond Log Analytics limits (up to 730 days), archive to a Storage Account with lifecycle management:

# logging/storage.tf

resource "azurerm_storage_account" "ai_logs" {
  name                     = "${var.environment}ailogstore"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = var.environment == "prod" ? "GRS" : "LRS"
  min_tls_version          = "TLS1_2"

  tags = {
    Environment = var.environment
    Purpose     = "ai-diagnostic-logging"
  }
}

resource "azurerm_storage_management_policy" "ai_logs" {
  storage_account_id = azurerm_storage_account.ai_logs.id

  rule {
    name    = "archive-old-logs"
    enabled = true

    filters {
      blob_types = ["blockBlob"]
    }

    actions {
      base_blob {
        tier_to_cool_after_days_since_modification_greater_than    = var.cool_tier_days
        tier_to_archive_after_days_since_modification_greater_than = var.archive_tier_days
        delete_after_days_since_modification_greater_than          = var.storage_retention_days
      }
    }
  }
}

⚙️ Step 3: The Diagnostic Setting

This is the core resource. It attaches to your Cognitive Services account and routes all three log categories plus metrics to both destinations:

# logging/diagnostic_setting.tf

resource "azurerm_monitor_diagnostic_setting" "ai_foundry" {
  name                       = "${var.environment}-ai-foundry-diagnostics"
  target_resource_id         = var.cognitive_account_id
  log_analytics_workspace_id = azurerm_log_analytics_workspace.ai_logs.id
  storage_account_id         = azurerm_storage_account.ai_logs.id

  enabled_log {
    category = "Audit"
  }

  enabled_log {
    category = "RequestResponse"
  }

  enabled_log {
    category = "Trace"
  }

  metric {
    category = "AllMetrics"
  }
}

That's it. Every API call to your AI Foundry endpoint now flows to both Log Analytics and Storage. No IAM roles to configure, no bucket policies - Azure handles the plumbing internally.

Key detail: The target_resource_id is your azurerm_cognitive_account resource ID from Post 1. If you have multiple cognitive accounts (e.g., separate ones per environment), each needs its own diagnostic setting.

🔧 Step 4: Variables

# logging/variables.tf

variable "environment" { type = string }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "cognitive_account_id" { type = string }

variable "log_analytics_retention_days" {
  type    = number
  default = 30
}

variable "cool_tier_days" {
  type    = number
  default = 30
}

variable "archive_tier_days" {
  type    = number
  default = 90
}

variable "storage_retention_days" {
  type    = number
  default = 365
}

Per-environment configs:

# environments/dev.tfvars
log_analytics_retention_days = 30
cool_tier_days               = 14
archive_tier_days            = 30
storage_retention_days       = 90

# environments/prod.tfvars
log_analytics_retention_days = 90
cool_tier_days               = 90
archive_tier_days            = 365
storage_retention_days       = 2555  # 7 years for regulated industries

🔍 Step 5: Query Your Logs

Once diagnostic settings are active, query your AI logs using KQL in Log Analytics:

// All AI API calls in the last 24 hours
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where TimeGenerated > ago(24h)
| project TimeGenerated, OperationName, DurationMs, ResultType
| order by TimeGenerated desc

// Average response time by operation
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize avg(DurationMs), count() by OperationName
| order by count_ desc

// Error rate over time (5-minute buckets)
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize
    total = count(),
    errors = countif(ResultType != "Success")
    by bin(TimeGenerated, 5m)
| extend error_rate = round(errors * 100.0 / total, 2)
| order by TimeGenerated desc

// Request volume by hour (capacity planning)
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| summarize count() by bin(TimeGenerated, 1h), OperationName
| render timechart

🚨 Step 6: Alerting

Set up alerts for error spikes and latency degradation:

# logging/alerts.tf

resource "azurerm_monitor_metric_alert" "ai_latency" {
  name                = "${var.environment}-ai-high-latency"
  resource_group_name = var.resource_group_name
  scopes              = [var.cognitive_account_id]
  description         = "AI Foundry response latency exceeds threshold"
  severity            = 2
  frequency           = "PT5M"
  window_size         = "PT15M"

  criteria {
    metric_namespace = "Microsoft.CognitiveServices/accounts"
    metric_name      = "Latency"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 5000
  }

  action {
    action_group_id = var.action_group_id
  }
}

resource "azurerm_monitor_metric_alert" "ai_errors" {
  name                = "${var.environment}-ai-high-error-rate"
  resource_group_name = var.resource_group_name
  scopes              = [var.cognitive_account_id]
  description         = "AI Foundry error rate spike"
  severity            = 1
  frequency           = "PT1M"
  window_size         = "PT5M"

  criteria {
    metric_namespace = "Microsoft.CognitiveServices/accounts"
    metric_name      = "ClientErrors"
    aggregation      = "Total"
    operator         = "GreaterThan"
    threshold        = 50
  }

  action {
    action_group_id = var.action_group_id
  }
}

📐 Production Architecture

┌──────────────────────────────────┐
│  Azure AI Foundry API Call       │
│  (Chat / Completions / Embed)    │
└───────────────┬──────────────────┘
                │
    Diagnostic Setting
    (azurerm_monitor_diagnostic_setting)
                │
    ┌───────────┼───────────┐
    │           │           │
    ▼           ▼           ▼
┌────────┐ ┌────────┐ ┌──────────┐
│ Log    │ │Storage │ │ Metric   │
│Analyt. │ │Account │ │ Alerts   │
│        │ │        │ │          │
│ KQL    │ │ Long   │ │ Latency  │
│ query  │ │ term   │ │ & error  │
│ & dash │ │ archive│ │ alerts   │
└────────┘ └────────┘ └──────────┘

Dual-destination pattern: Log Analytics for real-time queries and dashboards (shorter retention, KQL access). Storage Account for compliance retention (lifecycle to Cool/Archive tiers, years of data).

💡 Tri-Cloud Comparison: Logging Architecture

Aspect	AWS (Bedrock)	GCP (Vertex AI)	Azure (AI Foundry)
Core Terraform resource	`aws_bedrock_model_invocation_logging_configuration`	`google_project_iam_audit_config` + log sinks	`azurerm_monitor_diagnostic_setting`
Prompt/response bodies	Included in logs (inline)	Separate BigQuery logging (API config)	Not in diagnostic logs (needs APIM)
Real-time query engine	CloudWatch Insights	BigQuery SQL	KQL (Log Analytics)
Long-term storage	S3 + Glacier lifecycle	GCS + Nearline/Coldline lifecycle	Storage Account + Cool/Archive lifecycle
Scope	Per-region, per-account	Per-project, per-service	Per-resource
Log categories	Single config (text/image/embedding toggles)	Three audit log types (Admin/Data Read/Data Write)	Three categories (Audit/RequestResponse/Trace)

The biggest Azure difference: diagnostic settings are a generic Azure Monitor pattern that works identically across all Azure services, not an AI-specific feature. The same azurerm_monitor_diagnostic_setting you'd use for a SQL database or Key Vault works for AI Foundry. This makes it familiar if you're already on Azure, but it also means the AI-specific logging depth (like full prompt/response capture) requires additional architecture.

⏭️ What's Next

This is Post 3 of the Azure AI Infrastructure with Terraform series.

Post 1: Deploy AI Foundry: First AI Endpoint
Post 2: AI Foundry Content Safety 🛡️
Post 3: Diagnostic Logging (you are here) 📋

Every AI Foundry call now has a paper trail. Operation names, durations, status codes, and caller identity - all flowing to Log Analytics and Storage, all managed by Terraform, all queryable with KQL. 📋

Found this helpful? Follow for the full Azure AI Infrastructure with Terraform series! 💬

DEV Community