Suhas Mallesh

Posted on Feb 20

Azure AI Foundry Content Safety with Terraform: RAI Policies + Content Filters as Code 🛡️

#azure #terraform #devops #ai

Azure gives you RAI policies for content filtering and a standalone Content Safety service for PII and custom screening. Here's how to deploy both with Terraform so every safety rule is version-controlled.

You deployed your first Azure AI Foundry endpoint (Post 1). GPT responds, tokens flow. But what stops it from generating hate speech, leaking a customer's credit card number, or falling for a jailbreak?

Azure gives you two safety layers:

RAI (Responsible AI) Policies - content filter policies attached directly to model deployments, filtering hate, violence, sexual content, self-harm, jailbreaks, and protected material
Azure AI Content Safety - a standalone cognitive service for text/image moderation, custom blocklists, and PII detection via external API calls

The key architectural difference from AWS and GCP: on Azure, content filters are attached to the deployment itself via rai_policy_name. The policy and the deployment are tightly coupled. This means safety is enforced at the infrastructure level, not the application level. 🎯

🧱 Azure Safety Architecture

Layer	Service	What It Does	Managed By
Content filtering	RAI Policy (`azurerm_cognitive_account_rai_policy`)	Blocks hate, violence, sexual, self-harm content	Terraform
Jailbreak detection	RAI Policy (content_filter: Jailbreak)	Blocks prompt injection attempts	Terraform
Indirect attacks	RAI Policy (content_filter: Indirect Attack)	Blocks indirect prompt injection via data	Terraform
Protected material	RAI Policy (content_filter)	Blocks copyrighted text/code	Terraform
Standalone moderation	Content Safety Account (`kind = "ContentSafety"`)	Text/image moderation API, custom blocklists	Terraform
PII detection	Content Safety + Azure DLP	Detects/redacts sensitive data	Application code

🏗️ Step 1: RAI Policy with Content Filters

The azurerm_cognitive_account_rai_policy resource defines content filter rules. Each filter needs both a Prompt (input) and Completion (output) entry:

# content-safety/rai_policy.tf

resource "azurerm_cognitive_account_rai_policy" "ai_safety" {
  name                = "${var.environment}-content-filter"
  cognitive_account_id = var.cognitive_account_id
  base_policy_name    = "Microsoft.Default"

  # ━━━ Hate ━━━
  content_filter {
    name               = "Hate"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["hate"]
    source             = "Prompt"
  }
  content_filter {
    name               = "Hate"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["hate"]
    source             = "Completion"
  }

  # ━━━ Sexual ━━━
  content_filter {
    name               = "Sexual"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["sexual"]
    source             = "Prompt"
  }
  content_filter {
    name               = "Sexual"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["sexual"]
    source             = "Completion"
  }

  # ━━━ Violence ━━━
  content_filter {
    name               = "Violence"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["violence"]
    source             = "Prompt"
  }
  content_filter {
    name               = "Violence"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["violence"]
    source             = "Completion"
  }

  # ━━━ Self-Harm ━━━
  content_filter {
    name               = "SelfHarm"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["selfharm"]
    source             = "Prompt"
  }
  content_filter {
    name               = "SelfHarm"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = var.content_filter_thresholds["selfharm"]
    source             = "Completion"
  }

  # ━━━ Jailbreak Detection ━━━
  content_filter {
    name               = "Jailbreak"
    filter_enabled     = true
    block_enabled      = true
    severity_threshold = "High"  # Required by provider, not used by Azure
    source             = "Prompt"
  }

  # ━━━ Indirect Prompt Attack ━━━
  content_filter {
    name               = "Indirect Attack"
    filter_enabled     = var.enable_indirect_attack_filter
    block_enabled      = var.enable_indirect_attack_filter
    severity_threshold = "High"
    source             = "Prompt"
  }

  # ━━━ Protected Material ━━━
  content_filter {
    name               = "Protected Material Text"
    filter_enabled     = var.enable_protected_material
    block_enabled      = var.enable_protected_material
    severity_threshold = "High"
    source             = "Completion"
  }
  content_filter {
    name               = "Protected Material Code"
    filter_enabled     = var.enable_protected_material
    block_enabled      = var.enable_protected_material
    severity_threshold = "High"
    source             = "Completion"
  }
}

🔧 Step 2: Variable-Driven Configuration

# content-safety/variables.tf

variable "environment" {
  type = string
}

variable "cognitive_account_id" {
  type        = string
  description = "ID of the AI Foundry cognitive account from Post 1"
}

variable "content_filter_thresholds" {
  type        = map(string)
  description = "Severity threshold per category: Low, Medium, High"
  default = {
    hate     = "Medium"
    sexual   = "Medium"
    violence = "Medium"
    selfharm = "Low"
  }
}

variable "enable_indirect_attack_filter" {
  type    = bool
  default = true
}

variable "enable_protected_material" {
  type    = bool
  default = true
}

Per-environment configs:

# environments/dev.tfvars
environment = "dev"
content_filter_thresholds = {
  hate     = "High"      # Lenient - only block extreme content
  sexual   = "Medium"
  violence = "High"
  selfharm = "Medium"
}
enable_indirect_attack_filter = false  # Skip in dev
enable_protected_material     = false

# environments/prod.tfvars
environment = "prod"
content_filter_thresholds = {
  hate     = "Low"       # Strict - block anything borderline
  sexual   = "Low"
  violence = "Low"
  selfharm = "Low"
}
enable_indirect_attack_filter = true
enable_protected_material     = true

Threshold behavior: Low blocks the most content (anything at low severity or above). High only blocks clearly extreme content. This is the opposite of what you might expect - think of it as "block when severity reaches this level."

🔌 Step 3: Attach RAI Policy to Your Deployment

The RAI policy links to your model deployment via rai_policy_name. Update your deployment from Post 1:

resource "azurerm_cognitive_deployment" "primary" {
  name                 = "${var.environment}-${var.primary_model.name}"
  cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
  rai_policy_name      = azurerm_cognitive_account_rai_policy.ai_safety.name

  model {
    format  = "OpenAI"
    name    = var.primary_model.name
    version = var.primary_model.version
  }

  sku {
    name     = "Standard"
    capacity = var.primary_model.tpm
  }
}

That's it. Every request to this deployment now runs through your content filter. No application code changes. No SDK wrappers. The filtering happens at Azure's infrastructure layer before your app ever sees the response. 🔒

🛡️ Step 4: Standalone Content Safety (Optional Layer)

For additional screening beyond RAI policies - custom blocklists, standalone text moderation, or screening content from non-Azure-OpenAI sources - deploy the Content Safety service:

resource "azurerm_cognitive_account" "content_safety" {
  name                  = "${var.environment}-content-safety"
  location              = var.location
  resource_group_name   = var.resource_group_name
  kind                  = "ContentSafety"
  sku_name              = "S0"
  custom_subdomain_name = "${var.environment}-content-safety-${var.unique_suffix}"

  identity {
    type = "SystemAssigned"
  }

  tags = {
    Environment = var.environment
    Purpose     = "ai-content-moderation"
  }
}

Use it in your Function App for additional screening:

from azure.ai.contentsafety import ContentSafetyClient
from azure.identity import DefaultAzureCredential

client = ContentSafetyClient(
    endpoint=os.environ["CONTENT_SAFETY_ENDPOINT"],
    credential=DefaultAzureCredential(),
)

def screen_text(text):
    """Screen content through Azure AI Content Safety."""
    from azure.ai.contentsafety.models import AnalyzeTextOptions
    result = client.analyze_text(AnalyzeTextOptions(text=text))

    for category in result.categories_analysis:
        if category.severity and category.severity >= 4:
            return False, f"Blocked: {category.category}"
    return True, None

This gives you the same standalone moderation pattern as AWS Bedrock's ApplyGuardrail API or GCP's Model Armor - useful for screening content from any source.

📊 Tri-Cloud Safety Comparison

Capability	AWS Bedrock	GCP Vertex AI	Azure AI Foundry
Content filtering	Guardrail resource	Model Armor + Gemini SafetySettings	RAI Policy (attached to deployment)
Filter attachment	Per-request parameter	Per-request + standalone API	Per-deployment (infrastructure-level)
PII/DLP	Built-in to guardrail	Model Armor + SDP	Content Safety API + Azure DLP
Jailbreak detection	Content filter category	Model Armor filter	RAI Policy filter
Custom blocklists	Word policy in guardrail	System instructions	RAI Policy + Content Safety
Protected material	Not available	Not available	RAI Policy filter (text + code)
Terraform resource	`aws_bedrock_guardrail`	`google_model_armor_template`	`azurerm_cognitive_account_rai_policy`

Azure's unique advantage: Protected material detection. Azure can detect and block copyrighted text and code in model outputs - a feature neither AWS nor GCP offers natively. For enterprises concerned about IP liability, this is significant. 🎯

🎯 What You Just Built

┌──────────────────────────────────┐
│  User Input                      │
└───────────────┬──────────────────┘
                │
                ▼
┌──────────────────────────────────┐
│  RAI Policy (Prompt Filters)     │
│  ✓ Hate / Sexual / Violence      │
│  ✓ Self-harm detection           │
│  ✓ Jailbreak detection           │
│  ✓ Indirect attack detection     │
└───────────────┬──────────────────┘
                │ Passed?
                ▼
┌──────────────────────────────────┐
│  Azure OpenAI Model              │
│  (GPT-4.1, o4-mini, GPT-5, etc.) │
└───────────────┬──────────────────┘
                │
                ▼
┌──────────────────────────────────┐
│  RAI Policy (Completion Filters) │
│  ✓ Content safety filters        │
│  ✓ Protected material (text+code)│
└───────────────┬──────────────────┘
                │ Passed?
                ▼
┌──────────────────────────────────┐
│  User Response                   │
└──────────────────────────────────┘

RAI policy attached at the deployment level. Every request filtered automatically. All managed by Terraform with environment-specific .tfvars. 🚀

⏭️ What's Next

This is Post 2 of the AI Infra on Azure with Terraform series.

Post 1: Deploy Azure AI Foundry: First Endpoint
Post 2: Content Safety (you are here) 🛡️
Post 3: Diagnostic Logging for AI Compliance
Post 4: RAG with Azure AI Search + OpenAI

Content safety attached at the infrastructure level - not bolted on in application code. When compliance asks how content is filtered, point them to a Terraform plan, not a portal screenshot. 🔒

Found this helpful? Follow for the full AI Infra on Azure with Terraform series! 💬

DEV Community