DEV Community

Cover image for Vertex AI Safety with Terraform: Model Armor + Gemini Content Filters as Code πŸ›‘οΈ
Suhas Mallesh
Suhas Mallesh

Posted on

Vertex AI Safety with Terraform: Model Armor + Gemini Content Filters as Code πŸ›‘οΈ

GCP gives you two layers of AI safety - Gemini's built-in content filters and Model Armor for PII, prompt injection, and malicious URL detection. Here's how to deploy both with Terraform.

You deployed your first Vertex AI endpoint (Post 1). Gemini responds, tokens flow. But what stops it from leaking a customer's SSN, falling for a prompt injection, or generating harmful content?

GCP gives you two safety layers that work together:

  1. Gemini Safety Settings - per-request content filters (hate speech, harassment, dangerous content, sexually explicit) configured in your application code via environment variables
  2. Model Armor - a standalone security service for prompt injection detection, PII/DLP filtering, malicious URL scanning, and RAI content safety, all managed with Terraform

This is fundamentally different from AWS Bedrock, where guardrails are a single unified resource. On GCP, content filtering lives at the model level, while PII/injection protection is a separate service. Understanding this split is key to architecting safety correctly. 🎯

🧱 GCP Safety Architecture

Layer Service What It Does Managed By
Content filtering Gemini Safety Settings Blocks hate, harassment, sexual, dangerous content Application code (env vars)
Prompt injection Model Armor Detects jailbreaks and injection attempts Terraform
PII/DLP Model Armor + Sensitive Data Protection Detects/redacts PII (SSN, email, credentials) Terraform
Malicious URLs Model Armor Blocks phishing/malware URLs in prompts Terraform
Non-configurable Built-in Always blocks CSAM, illegal content Google (can't disable)

The key insight: Gemini has non-configurable safety filters that always run (CSAM, etc.) plus configurable ones you can tune. Model Armor runs independently and works with any model, not just Gemini.

πŸ—οΈ Layer 1: Gemini Safety Settings via Environment Variables

Following the model-agnostic pattern from Post 1, safety thresholds are environment variables. Your Cloud Function reads them at runtime:

# vertex-ai/variables.tf - Add to your Post 1 config

variable "safety_settings" {
  type = map(string)
  description = "Harm category thresholds: OFF, BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH"
  default = {
    HARM_CATEGORY_HATE_SPEECH       = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_HARASSMENT        = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_MEDIUM_AND_ABOVE"
  }
}
Enter fullscreen mode Exit fullscreen mode

Per-environment configs:

# environments/dev.tfvars
safety_settings = {
  HARM_CATEGORY_HATE_SPEECH       = "BLOCK_ONLY_HIGH"
  HARM_CATEGORY_HARASSMENT        = "BLOCK_ONLY_HIGH"
  HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
  HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_ONLY_HIGH"
}

# environments/prod.tfvars
safety_settings = {
  HARM_CATEGORY_HATE_SPEECH       = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_HARASSMENT        = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_LOW_AND_ABOVE"
}
Enter fullscreen mode Exit fullscreen mode

Pass them as a JSON environment variable in your Cloud Function:

# In your Cloud Function resource from Post 1
environment_variables = {
  GCP_PROJECT     = var.project_id
  GCP_REGION      = var.region
  MODEL_ID        = var.primary_model.id
  SAFETY_SETTINGS = jsonencode(var.safety_settings)
}
Enter fullscreen mode Exit fullscreen mode

The application code applies them per request:

import json
import os
from vertexai.generative_models import GenerativeModel, SafetySetting, HarmCategory, HarmBlockThreshold

THRESHOLD_MAP = {
    "OFF": HarmBlockThreshold.OFF,
    "BLOCK_LOW_AND_ABOVE": HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    "BLOCK_MEDIUM_AND_ABOVE": HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    "BLOCK_ONLY_HIGH": HarmBlockThreshold.BLOCK_ONLY_HIGH,
}

CATEGORY_MAP = {
    "HARM_CATEGORY_HATE_SPEECH": HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    "HARM_CATEGORY_HARASSMENT": HarmCategory.HARM_CATEGORY_HARASSMENT,
    "HARM_CATEGORY_SEXUALLY_EXPLICIT": HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    "HARM_CATEGORY_DANGEROUS_CONTENT": HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
}

def get_safety_settings():
    raw = json.loads(os.environ.get("SAFETY_SETTINGS", "{}"))
    return [
        SafetySetting(
            category=CATEGORY_MAP[cat],
            threshold=THRESHOLD_MAP[thresh],
        )
        for cat, thresh in raw.items()
        if cat in CATEGORY_MAP and thresh in THRESHOLD_MAP
    ]

# In your handler:
model = GenerativeModel(os.environ.get("MODEL_ID"))
response = model.generate_content(
    prompt,
    safety_settings=get_safety_settings(),
    generation_config={"max_output_tokens": max_tokens},
)
Enter fullscreen mode Exit fullscreen mode

Changing safety thresholds is now a .tfvars update. No code changes. 🎯

πŸ›‘οΈ Layer 2: Model Armor with Terraform

Model Armor is GCP's dedicated AI security service. It uses google_model_armor_template resources to define screening policies:

# model-armor/main.tf

resource "google_project_service" "model_armor" {
  project = var.project_id
  service = "modelarmor.googleapis.com"

  disable_on_destroy = false
}

resource "google_model_armor_template" "ai_safety" {
  location    = var.region
  template_id = "${var.environment}-ai-safety"

  filter_config {
    # RAI content safety filters
    rai_settings {
      rai_filters {
        filter_type      = "HATE_SPEECH"
        confidence_level = var.rai_confidence_levels["hate_speech"]
      }
      rai_filters {
        filter_type      = "HARASSMENT"
        confidence_level = var.rai_confidence_levels["harassment"]
      }
      rai_filters {
        filter_type      = "SEXUALLY_EXPLICIT"
        confidence_level = var.rai_confidence_levels["sexually_explicit"]
      }
      rai_filters {
        filter_type      = "DANGEROUS"
        confidence_level = var.rai_confidence_levels["dangerous"]
      }
    }

    # Prompt injection and jailbreak detection
    pi_and_jailbreak_filter_settings {
      filter_enforcement = "ENABLED"
      confidence_level   = var.pi_jailbreak_confidence
    }

    # Sensitive data protection (PII/DLP)
    sdp_settings {
      basic_config {
        filter_enforcement = var.sdp_enforcement
      }
    }

    # Malicious URL detection
    malicious_uri_filter_settings {
      filter_enforcement = var.malicious_uri_enforcement
    }
  }

  depends_on = [google_project_service.model_armor]
}
Enter fullscreen mode Exit fullscreen mode

Variables for environment-specific tuning:

# model-armor/variables.tf

variable "rai_confidence_levels" {
  type        = map(string)
  description = "Confidence level per RAI category: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
  default = {
    hate_speech        = "MEDIUM_AND_ABOVE"
    harassment         = "MEDIUM_AND_ABOVE"
    sexually_explicit  = "LOW_AND_ABOVE"
    dangerous          = "MEDIUM_AND_ABOVE"
  }
}

variable "pi_jailbreak_confidence" {
  type        = string
  description = "Prompt injection detection confidence: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
  default     = "LOW_AND_ABOVE"
}

variable "sdp_enforcement" {
  type        = string
  description = "Sensitive Data Protection: ENABLED or DISABLED"
  default     = "ENABLED"
}

variable "malicious_uri_enforcement" {
  type        = string
  description = "Malicious URL detection: ENABLED or DISABLED"
  default     = "ENABLED"
}
Enter fullscreen mode Exit fullscreen mode

Environment configs:

# environments/dev.tfvars
rai_confidence_levels = {
  hate_speech       = "HIGH"           # Lenient for testing
  harassment        = "HIGH"
  sexually_explicit = "MEDIUM_AND_ABOVE"
  dangerous         = "HIGH"
}
pi_jailbreak_confidence    = "MEDIUM_AND_ABOVE"
sdp_enforcement            = "DISABLED"   # No PII scanning in dev
malicious_uri_enforcement  = "DISABLED"

# environments/prod.tfvars
rai_confidence_levels = {
  hate_speech       = "LOW_AND_ABOVE"  # Strict
  harassment        = "LOW_AND_ABOVE"
  sexually_explicit = "LOW_AND_ABOVE"
  dangerous         = "LOW_AND_ABOVE"
}
pi_jailbreak_confidence    = "LOW_AND_ABOVE"
sdp_enforcement            = "ENABLED"
malicious_uri_enforcement  = "ENABLED"
Enter fullscreen mode Exit fullscreen mode

πŸ”Œ Integrating Model Armor with Your Endpoint

Model Armor screens content via API before or after model invocation. Add it to your Cloud Function:

from google.cloud import modelarmor_v1

armor_client = modelarmor_v1.ModelArmorClient()
TEMPLATE = f"projects/{PROJECT}/locations/{REGION}/templates/{TEMPLATE_ID}"

def screen_prompt(text):
    """Screen user input before sending to Gemini."""
    request = modelarmor_v1.SanitizeUserPromptRequest(
        name=TEMPLATE,
        user_prompt_data=modelarmor_v1.DataItem(text=text),
    )
    result = armor_client.sanitize_user_prompt(request=request)

    if result.sanitization_result.filter_match_state == \
       modelarmor_v1.FilterMatchState.MATCH_FOUND:
        return False, "Your message was flagged for safety."
    return True, None

def screen_response(text):
    """Screen model output before returning to user."""
    request = modelarmor_v1.SanitizeModelResponseRequest(
        name=TEMPLATE,
        model_response_data=modelarmor_v1.DataItem(text=text),
    )
    result = armor_client.sanitize_model_response(request=request)

    if result.sanitization_result.filter_match_state == \
       modelarmor_v1.FilterMatchState.MATCH_FOUND:
        return False, "Response filtered for safety."
    return True, None
Enter fullscreen mode Exit fullscreen mode

This gives you the same input/output screening pattern as AWS Bedrock Guardrails, but via an explicit API call rather than a parameter on model invocation.

🏒 Floor Settings: Organization-Level Enforcement

For enterprises, Model Armor floor settings enforce minimum safety standards across all templates in a project or organization:

resource "google_model_armor_floorsetting" "org_baseline" {
  parent   = "projects/${var.project_id}"
  location = "global"

  filter_config {
    rai_settings {
      rai_filters {
        filter_type      = "DANGEROUS"
        confidence_level = "MEDIUM_AND_ABOVE"
      }
    }
    pi_and_jailbreak_filter_settings {
      filter_enforcement = "ENABLED"
      confidence_level   = "HIGH"
    }
    sdp_settings {
      basic_config {
        filter_enforcement = "ENABLED"
      }
    }
  }

  enable_floor_setting_enforcement = true
}
Enter fullscreen mode Exit fullscreen mode

This means even if a team creates a permissive template, the floor setting enforces minimum protections. Templates can be stricter than the floor, never weaker.

πŸ“Š AWS vs GCP Safety: Side-by-Side

Capability AWS Bedrock Guardrails GCP Model Armor + Safety Settings
Content filtering Single aws_bedrock_guardrail resource Gemini SafetySettings (code) + Model Armor RAI (Terraform)
PII/DLP Built-in to guardrail (BLOCK/ANONYMIZE) Model Armor + Sensitive Data Protection
Prompt injection Content filter category Dedicated Model Armor filter
Denied topics Built-in topic policy System instructions (no Terraform resource)
Word filters Built-in word policy System instructions or custom logic
Malicious URLs Not available Model Armor filter
Org enforcement Organization-level guardrails Floor settings (project/org/folder)
Model scope Bedrock models only Any model, any cloud

Key difference: Model Armor is model-agnostic by design. You can use it to screen prompts headed for Gemini, self-hosted Llama, or even OpenAI. That makes it a strong choice for multi-model architectures. πŸ”’

🎯 What You Just Built

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Input                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Model Armor (Input Screening)   β”‚
β”‚  βœ“ RAI content safety            β”‚
β”‚  βœ“ Prompt injection detection    β”‚
β”‚  βœ“ PII/DLP scanning              β”‚
β”‚  βœ“ Malicious URL detection       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚ Passed?
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Gemini (with Safety Settings)   β”‚
β”‚  βœ“ Hate speech filter            β”‚
β”‚  βœ“ Harassment filter             β”‚
β”‚  βœ“ Sexually explicit filter      β”‚
β”‚  βœ“ Dangerous content filter      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Model Armor (Output Screening)  β”‚
β”‚  βœ“ PII/DLP in response           β”‚
β”‚  βœ“ RAI content check             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Response                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Two safety layers, both Terraform-managed, environment-specific via .tfvars. πŸš€

⏭️ What's Next

This is Post 2 of the AI Infra on GCP with Terraform series.

  • Post 1: Deploy Vertex AI: First Gemini Endpoint
  • Post 2: Vertex AI Safety (you are here) πŸ›‘οΈ
  • Post 3: Vertex AI Audit Logging with Cloud Logging
  • Post 4: RAG with Vertex AI Search + Datastore

Two safety layers, both as code. When your security team asks how AI content is filtered, point them to a Git repo - not a console screenshot. πŸ”’

Found this helpful? Follow for the full AI Infra on GCP with Terraform series! πŸ’¬

Top comments (0)