Suhas Mallesh

Posted on Feb 20

Vertex AI Safety with Terraform: Model Armor + Gemini Content Filters as Code 🛡️

#gcp #terraform #devops #ai

GCP gives you two layers of AI safety - Gemini's built-in content filters and Model Armor for PII, prompt injection, and malicious URL detection. Here's how to deploy both with Terraform.

You deployed your first Vertex AI endpoint (Post 1). Gemini responds, tokens flow. But what stops it from leaking a customer's SSN, falling for a prompt injection, or generating harmful content?

GCP gives you two safety layers that work together:

Gemini Safety Settings - per-request content filters (hate speech, harassment, dangerous content, sexually explicit) configured in your application code via environment variables
Model Armor - a standalone security service for prompt injection detection, PII/DLP filtering, malicious URL scanning, and RAI content safety, all managed with Terraform

This is fundamentally different from AWS Bedrock, where guardrails are a single unified resource. On GCP, content filtering lives at the model level, while PII/injection protection is a separate service. Understanding this split is key to architecting safety correctly. 🎯

🧱 GCP Safety Architecture

Layer	Service	What It Does	Managed By
Content filtering	Gemini Safety Settings	Blocks hate, harassment, sexual, dangerous content	Application code (env vars)
Prompt injection	Model Armor	Detects jailbreaks and injection attempts	Terraform
PII/DLP	Model Armor + Sensitive Data Protection	Detects/redacts PII (SSN, email, credentials)	Terraform
Malicious URLs	Model Armor	Blocks phishing/malware URLs in prompts	Terraform
Non-configurable	Built-in	Always blocks CSAM, illegal content	Google (can't disable)

The key insight: Gemini has non-configurable safety filters that always run (CSAM, etc.) plus configurable ones you can tune. Model Armor runs independently and works with any model, not just Gemini.

🏗️ Layer 1: Gemini Safety Settings via Environment Variables

Following the model-agnostic pattern from Post 1, safety thresholds are environment variables. Your Cloud Function reads them at runtime:

# vertex-ai/variables.tf - Add to your Post 1 config

variable "safety_settings" {
  type = map(string)
  description = "Harm category thresholds: OFF, BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH"
  default = {
    HARM_CATEGORY_HATE_SPEECH       = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_HARASSMENT        = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
    HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_MEDIUM_AND_ABOVE"
  }
}

Per-environment configs:

# environments/dev.tfvars
safety_settings = {
  HARM_CATEGORY_HATE_SPEECH       = "BLOCK_ONLY_HIGH"
  HARM_CATEGORY_HARASSMENT        = "BLOCK_ONLY_HIGH"
  HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
  HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_ONLY_HIGH"
}

# environments/prod.tfvars
safety_settings = {
  HARM_CATEGORY_HATE_SPEECH       = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_HARASSMENT        = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_LOW_AND_ABOVE"
  HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_LOW_AND_ABOVE"
}

Pass them as a JSON environment variable in your Cloud Function:

# In your Cloud Function resource from Post 1
environment_variables = {
  GCP_PROJECT     = var.project_id
  GCP_REGION      = var.region
  MODEL_ID        = var.primary_model.id
  SAFETY_SETTINGS = jsonencode(var.safety_settings)
}

The application code applies them per request:

import json
import os
from vertexai.generative_models import GenerativeModel, SafetySetting, HarmCategory, HarmBlockThreshold

THRESHOLD_MAP = {
    "OFF": HarmBlockThreshold.OFF,
    "BLOCK_LOW_AND_ABOVE": HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    "BLOCK_MEDIUM_AND_ABOVE": HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    "BLOCK_ONLY_HIGH": HarmBlockThreshold.BLOCK_ONLY_HIGH,
}

CATEGORY_MAP = {
    "HARM_CATEGORY_HATE_SPEECH": HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    "HARM_CATEGORY_HARASSMENT": HarmCategory.HARM_CATEGORY_HARASSMENT,
    "HARM_CATEGORY_SEXUALLY_EXPLICIT": HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    "HARM_CATEGORY_DANGEROUS_CONTENT": HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
}

def get_safety_settings():
    raw = json.loads(os.environ.get("SAFETY_SETTINGS", "{}"))
    return [
        SafetySetting(
            category=CATEGORY_MAP[cat],
            threshold=THRESHOLD_MAP[thresh],
        )
        for cat, thresh in raw.items()
        if cat in CATEGORY_MAP and thresh in THRESHOLD_MAP
    ]

# In your handler:
model = GenerativeModel(os.environ.get("MODEL_ID"))
response = model.generate_content(
    prompt,
    safety_settings=get_safety_settings(),
    generation_config={"max_output_tokens": max_tokens},
)

Changing safety thresholds is now a .tfvars update. No code changes. 🎯

🛡️ Layer 2: Model Armor with Terraform

Model Armor is GCP's dedicated AI security service. It uses google_model_armor_template resources to define screening policies:

# model-armor/main.tf

resource "google_project_service" "model_armor" {
  project = var.project_id
  service = "modelarmor.googleapis.com"

  disable_on_destroy = false
}

resource "google_model_armor_template" "ai_safety" {
  location    = var.region
  template_id = "${var.environment}-ai-safety"

  filter_config {
    # RAI content safety filters
    rai_settings {
      rai_filters {
        filter_type      = "HATE_SPEECH"
        confidence_level = var.rai_confidence_levels["hate_speech"]
      }
      rai_filters {
        filter_type      = "HARASSMENT"
        confidence_level = var.rai_confidence_levels["harassment"]
      }
      rai_filters {
        filter_type      = "SEXUALLY_EXPLICIT"
        confidence_level = var.rai_confidence_levels["sexually_explicit"]
      }
      rai_filters {
        filter_type      = "DANGEROUS"
        confidence_level = var.rai_confidence_levels["dangerous"]
      }
    }

    # Prompt injection and jailbreak detection
    pi_and_jailbreak_filter_settings {
      filter_enforcement = "ENABLED"
      confidence_level   = var.pi_jailbreak_confidence
    }

    # Sensitive data protection (PII/DLP)
    sdp_settings {
      basic_config {
        filter_enforcement = var.sdp_enforcement
      }
    }

    # Malicious URL detection
    malicious_uri_filter_settings {
      filter_enforcement = var.malicious_uri_enforcement
    }
  }

  depends_on = [google_project_service.model_armor]
}

Variables for environment-specific tuning:

# model-armor/variables.tf

variable "rai_confidence_levels" {
  type        = map(string)
  description = "Confidence level per RAI category: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
  default = {
    hate_speech        = "MEDIUM_AND_ABOVE"
    harassment         = "MEDIUM_AND_ABOVE"
    sexually_explicit  = "LOW_AND_ABOVE"
    dangerous          = "MEDIUM_AND_ABOVE"
  }
}

variable "pi_jailbreak_confidence" {
  type        = string
  description = "Prompt injection detection confidence: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
  default     = "LOW_AND_ABOVE"
}

variable "sdp_enforcement" {
  type        = string
  description = "Sensitive Data Protection: ENABLED or DISABLED"
  default     = "ENABLED"
}

variable "malicious_uri_enforcement" {
  type        = string
  description = "Malicious URL detection: ENABLED or DISABLED"
  default     = "ENABLED"
}

Environment configs:

# environments/dev.tfvars
rai_confidence_levels = {
  hate_speech       = "HIGH"           # Lenient for testing
  harassment        = "HIGH"
  sexually_explicit = "MEDIUM_AND_ABOVE"
  dangerous         = "HIGH"
}
pi_jailbreak_confidence    = "MEDIUM_AND_ABOVE"
sdp_enforcement            = "DISABLED"   # No PII scanning in dev
malicious_uri_enforcement  = "DISABLED"

# environments/prod.tfvars
rai_confidence_levels = {
  hate_speech       = "LOW_AND_ABOVE"  # Strict
  harassment        = "LOW_AND_ABOVE"
  sexually_explicit = "LOW_AND_ABOVE"
  dangerous         = "LOW_AND_ABOVE"
}
pi_jailbreak_confidence    = "LOW_AND_ABOVE"
sdp_enforcement            = "ENABLED"
malicious_uri_enforcement  = "ENABLED"

🔌 Integrating Model Armor with Your Endpoint

Model Armor screens content via API before or after model invocation. Add it to your Cloud Function:

from google.cloud import modelarmor_v1

armor_client = modelarmor_v1.ModelArmorClient()
TEMPLATE = f"projects/{PROJECT}/locations/{REGION}/templates/{TEMPLATE_ID}"

def screen_prompt(text):
    """Screen user input before sending to Gemini."""
    request = modelarmor_v1.SanitizeUserPromptRequest(
        name=TEMPLATE,
        user_prompt_data=modelarmor_v1.DataItem(text=text),
    )
    result = armor_client.sanitize_user_prompt(request=request)

    if result.sanitization_result.filter_match_state == \
       modelarmor_v1.FilterMatchState.MATCH_FOUND:
        return False, "Your message was flagged for safety."
    return True, None

def screen_response(text):
    """Screen model output before returning to user."""
    request = modelarmor_v1.SanitizeModelResponseRequest(
        name=TEMPLATE,
        model_response_data=modelarmor_v1.DataItem(text=text),
    )
    result = armor_client.sanitize_model_response(request=request)

    if result.sanitization_result.filter_match_state == \
       modelarmor_v1.FilterMatchState.MATCH_FOUND:
        return False, "Response filtered for safety."
    return True, None

This gives you the same input/output screening pattern as AWS Bedrock Guardrails, but via an explicit API call rather than a parameter on model invocation.

🏢 Floor Settings: Organization-Level Enforcement

For enterprises, Model Armor floor settings enforce minimum safety standards across all templates in a project or organization:

resource "google_model_armor_floorsetting" "org_baseline" {
  parent   = "projects/${var.project_id}"
  location = "global"

  filter_config {
    rai_settings {
      rai_filters {
        filter_type      = "DANGEROUS"
        confidence_level = "MEDIUM_AND_ABOVE"
      }
    }
    pi_and_jailbreak_filter_settings {
      filter_enforcement = "ENABLED"
      confidence_level   = "HIGH"
    }
    sdp_settings {
      basic_config {
        filter_enforcement = "ENABLED"
      }
    }
  }

  enable_floor_setting_enforcement = true
}

This means even if a team creates a permissive template, the floor setting enforces minimum protections. Templates can be stricter than the floor, never weaker.

📊 AWS vs GCP Safety: Side-by-Side

Capability	AWS Bedrock Guardrails	GCP Model Armor + Safety Settings
Content filtering	Single `aws_bedrock_guardrail` resource	Gemini SafetySettings (code) + Model Armor RAI (Terraform)
PII/DLP	Built-in to guardrail (BLOCK/ANONYMIZE)	Model Armor + Sensitive Data Protection
Prompt injection	Content filter category	Dedicated Model Armor filter
Denied topics	Built-in topic policy	System instructions (no Terraform resource)
Word filters	Built-in word policy	System instructions or custom logic
Malicious URLs	Not available	Model Armor filter
Org enforcement	Organization-level guardrails	Floor settings (project/org/folder)
Model scope	Bedrock models only	Any model, any cloud

Key difference: Model Armor is model-agnostic by design. You can use it to screen prompts headed for Gemini, self-hosted Llama, or even OpenAI. That makes it a strong choice for multi-model architectures. 🔒

🎯 What You Just Built

┌──────────────────────────────────┐
│  User Input                      │
└───────────────┬──────────────────┘
                │
                ▼
┌──────────────────────────────────┐
│  Model Armor (Input Screening)   │
│  ✓ RAI content safety            │
│  ✓ Prompt injection detection    │
│  ✓ PII/DLP scanning              │
│  ✓ Malicious URL detection       │
└───────────────┬──────────────────┘
                │ Passed?
                ▼
┌──────────────────────────────────┐
│  Gemini (with Safety Settings)   │
│  ✓ Hate speech filter            │
│  ✓ Harassment filter             │
│  ✓ Sexually explicit filter      │
│  ✓ Dangerous content filter      │
└───────────────┬──────────────────┘
                │
                ▼
┌──────────────────────────────────┐
│  Model Armor (Output Screening)  │
│  ✓ PII/DLP in response           │
│  ✓ RAI content check             │
└───────────────┬──────────────────┘
                │
                ▼
┌──────────────────────────────────┐
│  User Response                   │
└──────────────────────────────────┘

Two safety layers, both Terraform-managed, environment-specific via .tfvars. 🚀

⏭️ What's Next

This is Post 2 of the AI Infra on GCP with Terraform series.

Post 1: Deploy Vertex AI: First Gemini Endpoint
Post 2: Vertex AI Safety (you are here) 🛡️
Post 3: Vertex AI Audit Logging with Cloud Logging
Post 4: RAG with Vertex AI Search + Datastore

Two safety layers, both as code. When your security team asks how AI content is filtered, point them to a Git repo - not a console screenshot. 🔒

Found this helpful? Follow for the full AI Infra on GCP with Terraform series! 💬

DEV Community