GCP gives you two layers of AI safety - Gemini's built-in content filters and Model Armor for PII, prompt injection, and malicious URL detection. Here's how to deploy both with Terraform.
You deployed your first Vertex AI endpoint (Post 1). Gemini responds, tokens flow. But what stops it from leaking a customer's SSN, falling for a prompt injection, or generating harmful content?
GCP gives you two safety layers that work together:
- Gemini Safety Settings - per-request content filters (hate speech, harassment, dangerous content, sexually explicit) configured in your application code via environment variables
- Model Armor - a standalone security service for prompt injection detection, PII/DLP filtering, malicious URL scanning, and RAI content safety, all managed with Terraform
This is fundamentally different from AWS Bedrock, where guardrails are a single unified resource. On GCP, content filtering lives at the model level, while PII/injection protection is a separate service. Understanding this split is key to architecting safety correctly. π―
π§± GCP Safety Architecture
| Layer | Service | What It Does | Managed By |
|---|---|---|---|
| Content filtering | Gemini Safety Settings | Blocks hate, harassment, sexual, dangerous content | Application code (env vars) |
| Prompt injection | Model Armor | Detects jailbreaks and injection attempts | Terraform |
| PII/DLP | Model Armor + Sensitive Data Protection | Detects/redacts PII (SSN, email, credentials) | Terraform |
| Malicious URLs | Model Armor | Blocks phishing/malware URLs in prompts | Terraform |
| Non-configurable | Built-in | Always blocks CSAM, illegal content | Google (can't disable) |
The key insight: Gemini has non-configurable safety filters that always run (CSAM, etc.) plus configurable ones you can tune. Model Armor runs independently and works with any model, not just Gemini.
ποΈ Layer 1: Gemini Safety Settings via Environment Variables
Following the model-agnostic pattern from Post 1, safety thresholds are environment variables. Your Cloud Function reads them at runtime:
# vertex-ai/variables.tf - Add to your Post 1 config
variable "safety_settings" {
type = map(string)
description = "Harm category thresholds: OFF, BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH"
default = {
HARM_CATEGORY_HATE_SPEECH = "BLOCK_MEDIUM_AND_ABOVE"
HARM_CATEGORY_HARASSMENT = "BLOCK_MEDIUM_AND_ABOVE"
HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_MEDIUM_AND_ABOVE"
}
}
Per-environment configs:
# environments/dev.tfvars
safety_settings = {
HARM_CATEGORY_HATE_SPEECH = "BLOCK_ONLY_HIGH"
HARM_CATEGORY_HARASSMENT = "BLOCK_ONLY_HIGH"
HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_MEDIUM_AND_ABOVE"
HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_ONLY_HIGH"
}
# environments/prod.tfvars
safety_settings = {
HARM_CATEGORY_HATE_SPEECH = "BLOCK_LOW_AND_ABOVE"
HARM_CATEGORY_HARASSMENT = "BLOCK_LOW_AND_ABOVE"
HARM_CATEGORY_SEXUALLY_EXPLICIT = "BLOCK_LOW_AND_ABOVE"
HARM_CATEGORY_DANGEROUS_CONTENT = "BLOCK_LOW_AND_ABOVE"
}
Pass them as a JSON environment variable in your Cloud Function:
# In your Cloud Function resource from Post 1
environment_variables = {
GCP_PROJECT = var.project_id
GCP_REGION = var.region
MODEL_ID = var.primary_model.id
SAFETY_SETTINGS = jsonencode(var.safety_settings)
}
The application code applies them per request:
import json
import os
from vertexai.generative_models import GenerativeModel, SafetySetting, HarmCategory, HarmBlockThreshold
THRESHOLD_MAP = {
"OFF": HarmBlockThreshold.OFF,
"BLOCK_LOW_AND_ABOVE": HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
"BLOCK_MEDIUM_AND_ABOVE": HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
"BLOCK_ONLY_HIGH": HarmBlockThreshold.BLOCK_ONLY_HIGH,
}
CATEGORY_MAP = {
"HARM_CATEGORY_HATE_SPEECH": HarmCategory.HARM_CATEGORY_HATE_SPEECH,
"HARM_CATEGORY_HARASSMENT": HarmCategory.HARM_CATEGORY_HARASSMENT,
"HARM_CATEGORY_SEXUALLY_EXPLICIT": HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
"HARM_CATEGORY_DANGEROUS_CONTENT": HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
}
def get_safety_settings():
raw = json.loads(os.environ.get("SAFETY_SETTINGS", "{}"))
return [
SafetySetting(
category=CATEGORY_MAP[cat],
threshold=THRESHOLD_MAP[thresh],
)
for cat, thresh in raw.items()
if cat in CATEGORY_MAP and thresh in THRESHOLD_MAP
]
# In your handler:
model = GenerativeModel(os.environ.get("MODEL_ID"))
response = model.generate_content(
prompt,
safety_settings=get_safety_settings(),
generation_config={"max_output_tokens": max_tokens},
)
Changing safety thresholds is now a .tfvars update. No code changes. π―
π‘οΈ Layer 2: Model Armor with Terraform
Model Armor is GCP's dedicated AI security service. It uses google_model_armor_template resources to define screening policies:
# model-armor/main.tf
resource "google_project_service" "model_armor" {
project = var.project_id
service = "modelarmor.googleapis.com"
disable_on_destroy = false
}
resource "google_model_armor_template" "ai_safety" {
location = var.region
template_id = "${var.environment}-ai-safety"
filter_config {
# RAI content safety filters
rai_settings {
rai_filters {
filter_type = "HATE_SPEECH"
confidence_level = var.rai_confidence_levels["hate_speech"]
}
rai_filters {
filter_type = "HARASSMENT"
confidence_level = var.rai_confidence_levels["harassment"]
}
rai_filters {
filter_type = "SEXUALLY_EXPLICIT"
confidence_level = var.rai_confidence_levels["sexually_explicit"]
}
rai_filters {
filter_type = "DANGEROUS"
confidence_level = var.rai_confidence_levels["dangerous"]
}
}
# Prompt injection and jailbreak detection
pi_and_jailbreak_filter_settings {
filter_enforcement = "ENABLED"
confidence_level = var.pi_jailbreak_confidence
}
# Sensitive data protection (PII/DLP)
sdp_settings {
basic_config {
filter_enforcement = var.sdp_enforcement
}
}
# Malicious URL detection
malicious_uri_filter_settings {
filter_enforcement = var.malicious_uri_enforcement
}
}
depends_on = [google_project_service.model_armor]
}
Variables for environment-specific tuning:
# model-armor/variables.tf
variable "rai_confidence_levels" {
type = map(string)
description = "Confidence level per RAI category: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
default = {
hate_speech = "MEDIUM_AND_ABOVE"
harassment = "MEDIUM_AND_ABOVE"
sexually_explicit = "LOW_AND_ABOVE"
dangerous = "MEDIUM_AND_ABOVE"
}
}
variable "pi_jailbreak_confidence" {
type = string
description = "Prompt injection detection confidence: LOW_AND_ABOVE, MEDIUM_AND_ABOVE, HIGH"
default = "LOW_AND_ABOVE"
}
variable "sdp_enforcement" {
type = string
description = "Sensitive Data Protection: ENABLED or DISABLED"
default = "ENABLED"
}
variable "malicious_uri_enforcement" {
type = string
description = "Malicious URL detection: ENABLED or DISABLED"
default = "ENABLED"
}
Environment configs:
# environments/dev.tfvars
rai_confidence_levels = {
hate_speech = "HIGH" # Lenient for testing
harassment = "HIGH"
sexually_explicit = "MEDIUM_AND_ABOVE"
dangerous = "HIGH"
}
pi_jailbreak_confidence = "MEDIUM_AND_ABOVE"
sdp_enforcement = "DISABLED" # No PII scanning in dev
malicious_uri_enforcement = "DISABLED"
# environments/prod.tfvars
rai_confidence_levels = {
hate_speech = "LOW_AND_ABOVE" # Strict
harassment = "LOW_AND_ABOVE"
sexually_explicit = "LOW_AND_ABOVE"
dangerous = "LOW_AND_ABOVE"
}
pi_jailbreak_confidence = "LOW_AND_ABOVE"
sdp_enforcement = "ENABLED"
malicious_uri_enforcement = "ENABLED"
π Integrating Model Armor with Your Endpoint
Model Armor screens content via API before or after model invocation. Add it to your Cloud Function:
from google.cloud import modelarmor_v1
armor_client = modelarmor_v1.ModelArmorClient()
TEMPLATE = f"projects/{PROJECT}/locations/{REGION}/templates/{TEMPLATE_ID}"
def screen_prompt(text):
"""Screen user input before sending to Gemini."""
request = modelarmor_v1.SanitizeUserPromptRequest(
name=TEMPLATE,
user_prompt_data=modelarmor_v1.DataItem(text=text),
)
result = armor_client.sanitize_user_prompt(request=request)
if result.sanitization_result.filter_match_state == \
modelarmor_v1.FilterMatchState.MATCH_FOUND:
return False, "Your message was flagged for safety."
return True, None
def screen_response(text):
"""Screen model output before returning to user."""
request = modelarmor_v1.SanitizeModelResponseRequest(
name=TEMPLATE,
model_response_data=modelarmor_v1.DataItem(text=text),
)
result = armor_client.sanitize_model_response(request=request)
if result.sanitization_result.filter_match_state == \
modelarmor_v1.FilterMatchState.MATCH_FOUND:
return False, "Response filtered for safety."
return True, None
This gives you the same input/output screening pattern as AWS Bedrock Guardrails, but via an explicit API call rather than a parameter on model invocation.
π’ Floor Settings: Organization-Level Enforcement
For enterprises, Model Armor floor settings enforce minimum safety standards across all templates in a project or organization:
resource "google_model_armor_floorsetting" "org_baseline" {
parent = "projects/${var.project_id}"
location = "global"
filter_config {
rai_settings {
rai_filters {
filter_type = "DANGEROUS"
confidence_level = "MEDIUM_AND_ABOVE"
}
}
pi_and_jailbreak_filter_settings {
filter_enforcement = "ENABLED"
confidence_level = "HIGH"
}
sdp_settings {
basic_config {
filter_enforcement = "ENABLED"
}
}
}
enable_floor_setting_enforcement = true
}
This means even if a team creates a permissive template, the floor setting enforces minimum protections. Templates can be stricter than the floor, never weaker.
π AWS vs GCP Safety: Side-by-Side
| Capability | AWS Bedrock Guardrails | GCP Model Armor + Safety Settings |
|---|---|---|
| Content filtering | Single aws_bedrock_guardrail resource |
Gemini SafetySettings (code) + Model Armor RAI (Terraform) |
| PII/DLP | Built-in to guardrail (BLOCK/ANONYMIZE) | Model Armor + Sensitive Data Protection |
| Prompt injection | Content filter category | Dedicated Model Armor filter |
| Denied topics | Built-in topic policy | System instructions (no Terraform resource) |
| Word filters | Built-in word policy | System instructions or custom logic |
| Malicious URLs | Not available | Model Armor filter |
| Org enforcement | Organization-level guardrails | Floor settings (project/org/folder) |
| Model scope | Bedrock models only | Any model, any cloud |
Key difference: Model Armor is model-agnostic by design. You can use it to screen prompts headed for Gemini, self-hosted Llama, or even OpenAI. That makes it a strong choice for multi-model architectures. π
π― What You Just Built
ββββββββββββββββββββββββββββββββββββ
β User Input β
βββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β Model Armor (Input Screening) β
β β RAI content safety β
β β Prompt injection detection β
β β PII/DLP scanning β
β β Malicious URL detection β
βββββββββββββββββ¬βββββββββββββββββββ
β Passed?
βΌ
ββββββββββββββββββββββββββββββββββββ
β Gemini (with Safety Settings) β
β β Hate speech filter β
β β Harassment filter β
β β Sexually explicit filter β
β β Dangerous content filter β
βββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β Model Armor (Output Screening) β
β β PII/DLP in response β
β β RAI content check β
βββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββ
β User Response β
ββββββββββββββββββββββββββββββββββββ
Two safety layers, both Terraform-managed, environment-specific via .tfvars. π
βοΈ What's Next
This is Post 2 of the AI Infra on GCP with Terraform series.
- Post 1: Deploy Vertex AI: First Gemini Endpoint
- Post 2: Vertex AI Safety (you are here) π‘οΈ
- Post 3: Vertex AI Audit Logging with Cloud Logging
- Post 4: RAG with Vertex AI Search + Datastore
Two safety layers, both as code. When your security team asks how AI content is filtered, point them to a Git repo - not a console screenshot. π
Found this helpful? Follow for the full AI Infra on GCP with Terraform series! π¬
Top comments (0)