Suhas Mallesh

Posted on Feb 25

Vertex AI Audit Logging with Terraform: Track Every AI Call from Prompt to Response 📋

#gcp #terraform #devops #ai

GCP doesn't log Vertex AI data access by default. Two Terraform resources change that - Cloud Audit Logs for metadata, log sinks for long-term retention. Here's the full setup.

You've deployed your Vertex AI endpoint (Post 1) and added safety filters (Post 2). Your app is generating responses in production. Then your compliance team asks:

"Can you prove who called which model, when, and what was sent?"

GCP gives you two logging layers for this. Cloud Audit Logs capture metadata about every Vertex AI API call - who, when, which model, whether it succeeded. Request-response logging captures the actual prompt and response bodies into BigQuery. Both are disabled by default. Terraform makes sure they're enabled before your first production call. 🎯

🧱 Two Logging Layers, Two Problems They Solve

Layer	What It Captures	Where It Goes	Terraform Resource
Cloud Audit Logs	Caller identity, model ID, method, timestamp, authorization	Cloud Logging	`google_project_iam_audit_config`
Request-Response Logging	Full prompt body, full response body, token counts	BigQuery	Endpoint config (API/SDK)

Cloud Audit Logs answer "who called what model and when." Request-response logging answers "what did they send and what came back." Most compliance scenarios need both.

This article focuses on Cloud Audit Logs since they're fully Terraform-manageable and cover the audit trail requirements. Request-response logging is configured per-endpoint via the API and is covered at the end.

🏗️ Step 1: Enable Data Access Audit Logs

GCP Cloud Audit Logs have three types. Admin Activity logs are always on and free - they capture resource creation/deletion. Data Access logs capture read operations like model predictions - these are off by default and the ones you need for AI compliance. System Event logs capture GCP-initiated actions.

For Vertex AI, every generateContent, predict, and streamGenerateContent call is a Data Access event. Without enabling these, you have no record of inference calls:

# logging/audit_config.tf

resource "google_project_iam_audit_config" "vertex_ai" {
  project = var.project_id
  service = "aiplatform.googleapis.com"

  audit_log_config {
    log_type = "ADMIN_READ"
  }

  audit_log_config {
    log_type = "DATA_READ"
  }

  audit_log_config {
    log_type = "DATA_WRITE"
  }
}

What this enables: Every Vertex AI API call now generates an audit log entry in Cloud Logging with the caller's identity, the model resource path, the method name, and the timestamp.

Cost note: Data Access logs can generate significant volume. In production with high call rates, use exempted members or log sinks with filters to control costs.

📊 Step 2: Log Sink to Cloud Storage

Audit logs in Cloud Logging have a default 30-day retention in the _Default bucket. For compliance, you need longer retention. A log sink exports Vertex AI audit logs to Cloud Storage:

# logging/sink_gcs.tf

resource "google_storage_bucket" "vertex_ai_logs" {
  name          = "${var.environment}-vertex-ai-audit-logs-${var.project_id}"
  location      = var.region
  project       = var.project_id
  force_destroy = var.environment != "prod"

  uniform_bucket_level_access = true

  versioning {
    enabled = true
  }

  lifecycle_rule {
    condition {
      age = var.nearline_transition_days
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }

  lifecycle_rule {
    condition {
      age = var.coldline_transition_days
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }

  lifecycle_rule {
    condition {
      age = var.log_retention_days
    }
    action {
      type = "Delete"
    }
  }
}

resource "google_logging_project_sink" "vertex_ai_gcs" {
  name        = "${var.environment}-vertex-ai-audit-to-gcs"
  project     = var.project_id
  destination = "storage.googleapis.com/${google_storage_bucket.vertex_ai_logs.name}"

  filter = <<-EOT
    protoPayload.serviceName="aiplatform.googleapis.com"
    AND logName:"cloudaudit.googleapis.com"
  EOT

  unique_writer_identity = true
}

resource "google_storage_bucket_iam_member" "sink_writer" {
  bucket = google_storage_bucket.vertex_ai_logs.name
  role   = "roles/storage.objectCreator"
  member = google_logging_project_sink.vertex_ai_gcs.writer_identity
}

🔍 Step 3: Log Sink to BigQuery

For queryable analytics - cost tracking per model, usage patterns, anomaly detection - send the same logs to BigQuery:

# logging/sink_bigquery.tf

resource "google_bigquery_dataset" "vertex_ai_logs" {
  dataset_id  = "${var.environment}_vertex_ai_audit_logs"
  project     = var.project_id
  location    = var.region
  description = "Vertex AI audit logs for analysis"

  default_table_expiration_ms = var.bq_table_expiration_days * 86400000

  labels = {
    environment = var.environment
    purpose     = "ai-audit-logging"
  }
}

resource "google_logging_project_sink" "vertex_ai_bigquery" {
  name        = "${var.environment}-vertex-ai-audit-to-bq"
  project     = var.project_id
  destination = "bigquery.googleapis.com/projects/${var.project_id}/datasets/${google_bigquery_dataset.vertex_ai_logs.dataset_id}"

  filter = <<-EOT
    protoPayload.serviceName="aiplatform.googleapis.com"
    AND logName:"cloudaudit.googleapis.com"
  EOT

  unique_writer_identity = true

  bigquery_options {
    use_partitioned_tables = true
  }
}

resource "google_bigquery_dataset_iam_member" "sink_writer" {
  project    = var.project_id
  dataset_id = google_bigquery_dataset.vertex_ai_logs.dataset_id
  role       = "roles/bigquery.dataEditor"
  member     = google_logging_project_sink.vertex_ai_bigquery.writer_identity
}

Partitioned tables are critical here - they partition by ingestion time so queries on recent data scan less and cost less.

⚙️ Step 4: Variables and Environment Configs

# logging/variables.tf

variable "project_id" { type = string }
variable "environment" { type = string }
variable "region" { type = string }

variable "nearline_transition_days" {
  type    = number
  default = 30
}

variable "coldline_transition_days" {
  type    = number
  default = 90
}

variable "log_retention_days" {
  type    = number
  default = 365
}

variable "bq_table_expiration_days" {
  type    = number
  default = 365
}

Per-environment configs:

# environments/dev.tfvars
nearline_transition_days = 15
coldline_transition_days = 30
log_retention_days       = 90
bq_table_expiration_days = 90

# environments/prod.tfvars
nearline_transition_days = 90
coldline_transition_days = 365
log_retention_days       = 2555  # 7 years for regulated industries
bq_table_expiration_days = 2555

🔍 Step 5: Query Your Audit Logs

Once the BigQuery sink is active, you can run SQL against your audit data:

-- Top models by invocation count (last 7 days)
SELECT
  protopayload_auditlog.resourceName AS model,
  COUNT(*) AS call_count
FROM `PROJECT.DATASET.cloudaudit_googleapis_com_data_access`
WHERE timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY model
ORDER BY call_count DESC;

-- All calls by a specific service account
SELECT
  timestamp,
  protopayload_auditlog.authenticationInfo.principalEmail,
  protopayload_auditlog.methodName,
  protopayload_auditlog.resourceName
FROM `PROJECT.DATASET.cloudaudit_googleapis_com_data_access`
WHERE protopayload_auditlog.authenticationInfo.principalEmail
  LIKE '%my-cloud-function-sa%';

-- Daily token usage trend (from request-response logs)
SELECT
  DATE(logging_time) AS day,
  model,
  SUM(JSON_EXTRACT_SCALAR(response, '$.usageMetadata.totalTokenCount')) AS total_tokens
FROM `PROJECT.DATASET.request_response_logging`
GROUP BY day, model
ORDER BY day DESC;

🚨 Step 6: Alerting on Anomalies

Create log-based metrics and alerts for suspicious patterns:

# logging/alerts.tf

resource "google_logging_metric" "vertex_ai_errors" {
  name    = "${var.environment}-vertex-ai-error-rate"
  project = var.project_id
  filter  = <<-EOT
    protoPayload.serviceName="aiplatform.googleapis.com"
    AND severity>=ERROR
  EOT

  metric_descriptor {
    metric_kind = "DELTA"
    value_type  = "INT64"
  }
}

resource "google_monitoring_alert_policy" "vertex_ai_errors" {
  display_name = "Vertex AI High Error Rate"
  project      = var.project_id
  combiner     = "OR"

  conditions {
    display_name = "Error rate spike"
    condition_threshold {
      filter          = "metric.type=\"logging.googleapis.com/user/${google_logging_metric.vertex_ai_errors.name}\""
      comparison      = "COMPARISON_GT"
      threshold_value = 50
      duration        = "300s"

      aggregations {
        alignment_period   = "300s"
        per_series_aligner = "ALIGN_SUM"
      }
    }
  }

  notification_channels = var.notification_channels
}

📐 Production Architecture

┌──────────────────────────────────┐
│  Vertex AI API Call              │
│  (generateContent / predict)     │
└───────────────┬──────────────────┘
                │
        Cloud Audit Logs
        (Cloud Logging)
                │
    ┌───────────┼───────────┐
    │           │           │
    ▼           ▼           ▼
┌────────┐ ┌────────┐ ┌──────────┐
│ GCS    │ │ BQ     │ │ Alerting │
│ Bucket │ │ Dataset│ │ Policies │
│        │ │        │ │          │
│ Long   │ │ SQL    │ │ Real-    │
│ term   │ │ query  │ │ time     │
│ archive│ │ & dash │ │ alerts   │
└────────┘ └────────┘ └──────────┘

Dual-sink pattern: GCS for long-term compliance retention (lifecycle to Coldline, years of data). BigQuery for queryable analytics (partitioned tables, SQL access, Looker dashboards).

📝 Request-Response Logging (Prompt/Response Bodies)

Cloud Audit Logs capture metadata but not the actual prompt and response content. For full prompt/response bodies, Vertex AI offers request-response logging to BigQuery. This is configured per-endpoint via the API, not through Terraform:

# Enable via Python SDK when creating/updating endpoint
from google.cloud import aiplatform

endpoint = aiplatform.Endpoint.create(
    display_name="my-endpoint",
    predict_request_response_logging_config={
        "enabled": True,
        "sampling_rate": 1.0,  # Log 100% in prod
        "bigquery_destination": {
            "output_uri": f"bq://{project_id}.{dataset_name}.request_response_logging"
        }
    }
)

This captures the full JSON body of every prompt and response. In production, set sampling_rate to 1.0 for full compliance coverage, or lower it in dev to reduce BigQuery costs.

💡 GCP vs AWS: Key Differences

Aspect	GCP (Vertex AI)	AWS (Bedrock)
Metadata logging	`google_project_iam_audit_config`	`aws_bedrock_model_invocation_logging_configuration`
Prompt/response bodies	Request-response logging to BigQuery	Inline in CloudWatch/S3 logs
Scope	Per-project, per-service	Per-region, per-account
Long-term storage	Log sinks to GCS/BigQuery	S3 with lifecycle policies
Query engine	BigQuery SQL (native)	Athena (requires Glue catalog)
Real-time alerts	Log-based metrics + Cloud Monitoring	CloudWatch metric filters + alarms

The biggest difference: GCP separates metadata from content logging. AWS bundles everything into one logging configuration. GCP's approach gives you finer cost control since Data Access logs (metadata) are cheaper than storing full prompt/response bodies in BigQuery.

⏭️ What's Next

This is Post 3 of the GCP AI Infrastructure with Terraform series.

Post 1: Deploy Vertex AI: First AI Endpoint
Post 2: Vertex AI Safety Filters 🛡️
Post 3: Audit Logging (you are here) 📋

Every Vertex AI call now has a paper trail. Caller identity, model, timestamp in Cloud Audit Logs. Full prompts and responses in BigQuery. All managed by Terraform, all queryable with SQL. 📋

Found this helpful? Follow for the full GCP AI Infrastructure with Terraform series! 💬

DEV Community