Suhas Mallesh

Posted on Feb 24

Vertex AI RAG Engine with Terraform: Your First RAG Pipeline on GCP 🔍

#gcp #terraform #devops #ai

Vertex AI RAG Engine handles chunking, embedding, and retrieval with a managed vector database. Terraform provisions the infrastructure - GCS bucket, service account, engine config - while the Python SDK manages corpus and file operations.

You have a Vertex AI endpoint answering general questions. Ask it about your company's internal docs and it hallucinates with confidence. RAG fixes this by grounding model responses in your actual data.

Vertex AI RAG Engine is GCP's fully managed RAG service. You create a corpus (the index), import files from Cloud Storage or Google Drive, and the engine handles chunking, embedding with a model like text-embedding-005, and vector storage in a managed Spanner-based database. At query time, it retrieves relevant chunks and feeds them as context to Gemini.

🏗️ Architecture Overview

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│  GCS Bucket  │────>│  RAG Engine  │────>│ RagManagedDb     │
│  (Documents) │     │  (Corpus)    │     │ (Managed Spanner │
│              │     │              │     │  Vector Store)   │
└──────────────┘     └──────┬───────┘     └──────────────────┘
                            │
                    ┌───────┴──────────┐
                    │ retrieveContexts │
                    │ + generateContent│
                    └──────────────────┘

Data flow: Documents in GCS are imported into a corpus. RAG Engine chunks them, generates embeddings via the configured model, and stores vectors in RagManagedDb. Queries use retrieveContexts for retrieval or generateContent with a RAG tool for end-to-end generation.

📦 Step 1: GCS Bucket for Documents

This bucket holds your source documents - PDFs, plain text, HTML, and Markdown:

# rag/gcs.tf

resource "google_storage_bucket" "rag_documents" {
  name          = "${var.environment}-${var.corpus_name}-rag-docs-${var.project_id}"
  location      = var.region
  project       = var.project_id
  force_destroy = var.environment != "prod"

  uniform_bucket_level_access = true

  versioning {
    enabled = true
  }

  labels = {
    environment = var.environment
    purpose     = "rag-document-source"
  }
}

🔑 Step 2: Service Account and IAM

RAG Engine uses a Google-managed service agent (service-PROJECT_NUMBER@gcp-sa-vertex-rag.iam.gserviceaccount.com), but your application code needs a service account with the right permissions to create corpora and import files:

# rag/iam.tf

data "google_project" "current" {
  project_id = var.project_id
}

resource "google_service_account" "rag_app" {
  account_id   = "${var.environment}-rag-app"
  display_name = "RAG Application Service Account"
  project      = var.project_id
}

# Permission to use Vertex AI RAG Engine APIs
resource "google_project_iam_member" "rag_user" {
  project = var.project_id
  role    = "roles/aiplatform.user"
  member  = "serviceAccount:${google_service_account.rag_app.email}"
}

# Permission to read documents from GCS
resource "google_storage_bucket_iam_member" "rag_reader" {
  bucket = google_storage_bucket.rag_documents.name
  role   = "roles/storage.objectViewer"
  member = "serviceAccount:${google_service_account.rag_app.email}"
}

# Grant RAG service agent access to the GCS bucket
resource "google_storage_bucket_iam_member" "rag_agent_reader" {
  bucket = google_storage_bucket.rag_documents.name
  role   = "roles/storage.objectViewer"
  member = "serviceAccount:service-${data.google_project.current.number}@gcp-sa-vertex-rag.iam.gserviceaccount.com"
}

Critical: The RAG service agent (gcp-sa-vertex-rag) needs storage.objectViewer on your bucket. Without this, file imports silently fail.

⚙️ Step 3: RAG Engine Configuration

The google_vertex_ai_rag_engine_config resource controls the managed database tier. This is a project-level setting that affects all corpora using RagManagedDb:

# rag/engine_config.tf

resource "google_vertex_ai_rag_engine_config" "this" {
  region  = var.region
  project = var.project_id

  rag_managed_db_config {
    dynamic "basic" {
      for_each = var.environment == "dev" ? [1] : []
      content {}
    }

    dynamic "scaled" {
      for_each = var.environment == "prod" ? [1] : []
      content {}
    }
  }
}

Tier options:

Tier	Use Case	Billing
Basic	Development, small corpora	Lower cost, sufficient for testing
Scaled	Production workloads	Higher throughput, better latency
Unprovisioned	Cleanup only	Deletes RAG Engine data

🔧 Step 4: Variables

# rag/variables.tf

variable "project_id" { type = string }
variable "environment" { type = string }
variable "region" { type = string }
variable "corpus_name" { type = string }

variable "embedding_model" {
  type    = string
  default = "text-embedding-005"
}

variable "chunk_size" {
  type    = number
  default = 512
}

variable "chunk_overlap" {
  type    = number
  default = 100
}

Per-environment configs:

# environments/dev.tfvars
corpus_name     = "company-docs"
embedding_model = "text-embedding-005"
chunk_size      = 300
chunk_overlap   = 50

# environments/prod.tfvars
corpus_name     = "company-docs"
embedding_model = "text-embedding-005"
chunk_size      = 512
chunk_overlap   = 100

🧠 Step 5: Create Corpus and Import Files (Python SDK)

Unlike AWS where the knowledge base and data source are Terraform resources, RAG Engine corpus management is done through the Python SDK or REST API. This is the operational layer that sits on top of the Terraform infrastructure:

from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

vertexai.init(project="YOUR_PROJECT_ID", location="us-east4")

# Configure embedding model
embedding_config = rag.RagEmbeddingModelConfig(
    vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
        publisher_model="publishers/google/models/text-embedding-005"
    )
)

# Create corpus
rag_corpus = rag.create_corpus(
    display_name="company-docs",
    description="Internal company documentation",
    embedding_model_config=embedding_config,
)
print(f"Corpus created: {rag_corpus.name}")

# Import files from GCS
rag.import_files(
    corpus_name=rag_corpus.name,
    paths=["gs://dev-company-docs-rag-docs-your-project/"],
    chunk_size=512,
    chunk_overlap=100,
)

Why not Terraform? Corpus and file operations are mutable, long-running, and frequently updated (new documents added regularly). They fit the operational workflow better than the infrastructure layer. Terraform manages the things that change rarely (buckets, IAM, engine config). The SDK manages the things that change often (corpus content).

🔍 Step 6: Query Your Knowledge Base

Two approaches for retrieval:

# Approach 1: Retrieve contexts only
response = rag.retrieval_query(
    rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name)],
    text="What is our refund policy?",
    rag_retrieval_config=rag.RagRetrievalConfig(
        top_k=5,
        filter=rag.RagRetrievalConfig.Filter(
            vector_distance_threshold=0.5
        )
    ),
)
for context in response.contexts.contexts:
    print(f"Source: {context.source_uri}")
    print(f"Text: {context.text[:200]}")

# Approach 2: End-to-end RAG with Gemini
rag_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name)],
            rag_retrieval_config=rag.RagRetrievalConfig(top_k=3)
        )
    )
)

model = GenerativeModel(
    model_name="gemini-2.0-flash-001",
    tools=[rag_tool]
)

response = model.generate_content("What is our refund policy?")
print(response.text)

Approach 2 is the most common pattern - Gemini automatically retrieves relevant chunks and generates a grounded response in a single call.

📊 Step 7: Enable APIs

Don't forget the API enablement in Terraform:

# rag/apis.tf

resource "google_project_service" "vertex_ai" {
  project = var.project_id
  service = "aiplatform.googleapis.com"
}

resource "google_project_service" "storage" {
  project = var.project_id
  service = "storage.googleapis.com"
}

Your first GCP RAG pipeline is deployed. Documents in GCS, vectors in RagManagedDb, retrieval via Gemini - infrastructure in Terraform, operations in Python, all repeatable across environments. 🔍

Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬

Top comments (2)

Matthew Hou • Feb 24

Thanks for making this approachable with Terraform! One thing I'd add for production use: the infrastructure is only half the story. The other half is the quality of your document ingestion pipeline. We spent a lot of time getting the GCP infrastructure right, then discovered that our retrieval quality was mediocre because we hadn't thought carefully about preprocessing — removing boilerplate, normalizing formatting, handling code blocks differently from prose. I'd recommend setting up a small evaluation harness early — a handful of question-answer pairs from your actual documents — so you can measure whether infrastructure changes and ingestion tweaks are actually improving things.

Suhas Mallesh • Feb 24

Thanks for going through the post. You’re right in this I have only explained the infrastructure part. Will consider your inputs for future posts on checking and improving the quality of the RAG pipeline that we have built.