RAG Engine's managed Spanner DB is simple but opaque. AlloyDB AI gives you pgvector with Google's ScaNN index, built-in Vertex AI integration, and full SQL control over your vectors. Here's how to set it up with Terraform for RAG workloads.
In RAG Post 1, we deployed a Vertex AI RAG Engine corpus backed by RagManagedDb, a fully managed Spanner-based vector store. It's zero-config, but you can't see your vectors, can't run SQL against them, and can't combine vector search with your existing relational data.
AlloyDB AI is GCP's alternative: a fully managed PostgreSQL-compatible database with pgvector for vector operations, Google's ScaNN index for high-performance similarity search, and built-in Vertex AI integration that lets you generate embeddings directly in SQL. If you already have relational data and want to add vector search without a separate system, AlloyDB is the answer. This post covers the Terraform setup and RAG patterns. π―
π Why AlloyDB AI for RAG?
The RAG Engine managed DB is great for getting started, but production teams often need more:
| Capability | RAG Engine (Managed DB) | AlloyDB AI + pgvector |
|---|---|---|
| Vector search | Managed, opaque | Full SQL control |
| Combine with relational data | β Separate system | β Same database |
| Embedding generation | SDK/API only | SQL function (embedding()) |
| Index type | Managed | ScaNN, IVFFlat, HNSW |
| Hybrid search | Alpha-weighted | SQL WHERE + vector ORDER BY |
| Infrastructure | Zero config | Cluster + instance to manage |
| Cost (dev) | Basic tier ~$50/month | 1 vCPU ~$120/month |
| Cost (prod) | Scaled tier ~$200/month | 4+ vCPU ~$400/month |
The trade-off: AlloyDB costs more and requires more setup, but gives you full control, SQL-native operations, and the ability to combine vector search with your existing data in a single query.
π§ Terraform Setup
AlloyDB requires a VPC with private service access. Here's the full infrastructure:
Network and Private Service Access
# network/main.tf
resource "google_compute_network" "vpc" {
name = "${var.environment}-${var.project}-vpc"
auto_create_subnetworks = false
project = var.project_id
}
resource "google_compute_global_address" "psa_range" {
name = "${var.environment}-psa-range"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 20
network = google_compute_network.vpc.id
project = var.project_id
}
resource "google_service_networking_connection" "psa" {
network = google_compute_network.vpc.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.psa_range.name]
}
AlloyDB Cluster and Instance
# alloydb/main.tf
resource "google_alloydb_cluster" "this" {
cluster_id = "${var.environment}-${var.project}-alloydb"
location = var.region
project = var.project_id
network_config {
network = google_compute_network.vpc.id
}
initial_user {
user = var.db_user
password = var.db_password
}
depends_on = [google_service_networking_connection.psa]
}
resource "google_alloydb_instance" "primary" {
cluster = google_alloydb_cluster.this.name
instance_id = "${var.environment}-primary"
instance_type = "PRIMARY"
machine_config {
cpu_count = var.cpu_count
}
depends_on = [google_alloydb_cluster.this]
}
IAM for Vertex AI Integration
AlloyDB needs permission to call Vertex AI embedding models. Grant the AlloyDB service agent the Vertex AI User role:
# alloydb/iam.tf
resource "google_project_iam_member" "alloydb_vertex_ai" {
project = var.project_id
role = "roles/aiplatform.user"
member = "serviceAccount:service-${data.google_project.this.number}@gcp-sa-alloydb.iam.gserviceaccount.com"
}
This is the critical step most people miss. Without it, the embedding() SQL function returns permission errors.
π§© Database Setup: Extensions and Schema
After Terraform provisions the cluster, enable the required extensions and create your RAG schema. This is a one-time setup via SQL:
-- Enable extensions
CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE;
CREATE EXTENSION IF NOT EXISTS vector;
-- RAG documents table
CREATE TABLE rag_chunks (
id SERIAL PRIMARY KEY,
source_uri TEXT NOT NULL,
title TEXT,
content TEXT NOT NULL,
department TEXT,
year INTEGER,
embedding VECTOR(768),
created_at TIMESTAMP DEFAULT NOW()
);
The google_ml_integration extension enables the embedding() function that calls Vertex AI models directly from SQL. The vector extension (pgvector) adds the VECTOR data type and similarity search operators.
π§ Generate Embeddings in SQL
This is AlloyDB AI's killer feature - no Python SDK or API calls needed:
-- Generate embedding for a single chunk
UPDATE rag_chunks
SET embedding = embedding(
'text-embedding-005',
content
)::VECTOR
WHERE id = 1;
-- Batch generate for all chunks without embeddings
UPDATE rag_chunks
SET embedding = embedding(
'text-embedding-005',
content
)::VECTOR
WHERE embedding IS NULL;
The embedding() function calls the Vertex AI text-embedding model and returns a vector directly. No external API calls, no Python glue code, no separate embedding pipeline. The embedding model must be available in your AlloyDB cluster's region.
π Vector Search: Three Patterns
Pattern 1: Pure Vector Search (Semantic)
SELECT title, content, source_uri,
embedding <=> embedding('text-embedding-005', 'What is our return policy?')::VECTOR AS distance
FROM rag_chunks
ORDER BY distance
LIMIT 5;
The <=> operator computes cosine distance. Lower distance = more similar.
Pattern 2: Filtered Vector Search (Hybrid)
This is where AlloyDB shines over managed vector stores - combine SQL WHERE clauses with vector search:
SELECT title, content, source_uri,
embedding <=> embedding('text-embedding-005', 'overtime penalties')::VECTOR AS distance
FROM rag_chunks
WHERE department = 'legal' AND year >= 2024
ORDER BY distance
LIMIT 5;
No separate metadata filtering system. Standard SQL WHERE clauses narrow the candidate set before vector search runs.
Pattern 3: Vector Search with Joins
Combine vector search with relational data across tables:
SELECT c.title, c.content, c.source_uri, d.last_updated, d.author
FROM rag_chunks c
JOIN documents d ON c.source_uri = d.uri
WHERE d.status = 'published'
ORDER BY c.embedding <=> embedding('text-embedding-005', 'refund policy changes')::VECTOR
LIMIT 5;
This is impossible with RAG Engine's managed DB or standalone vector stores - you'd need to join data across two separate systems.
β‘ ScaNN Index for Production Performance
For tables with more than a few thousand rows, create a ScaNN index. Google's Scalable Nearest Neighbor algorithm provides significantly faster approximate nearest neighbor search:
-- Create ScaNN index for production performance
CREATE INDEX rag_chunks_embedding_idx
ON rag_chunks
USING scann (embedding cosine)
WITH (num_leaves = 100);
Tuning: Set num_leaves to approximately the square root of your row count. For 10,000 chunks, use ~100 leaves. For 1,000,000 chunks, use ~1,000 leaves.
AlloyDB also supports standard pgvector indexes (IVFFlat and HNSW) if you prefer open-source compatibility.
π Integrating with Gemini for RAG
Use the retrieved chunks as context for Gemini generation:
from vertexai.generative_models import GenerativeModel
import pg8000
# Retrieve relevant chunks from AlloyDB
conn = pg8000.connect(host=alloydb_ip, user="app_user", database="rag_db")
cursor = conn.cursor()
cursor.execute("""
SELECT content FROM rag_chunks
ORDER BY embedding <=> embedding('text-embedding-005', %s)::VECTOR
LIMIT 5
""", (user_query,))
chunks = [row[0] for row in cursor.fetchall()]
context = "\n\n".join(chunks)
# Generate response with Gemini
model = GenerativeModel("gemini-2.0-flash")
response = model.generate_content(
f"Based on the following context, answer the question.\n\n"
f"Context:\n{context}\n\n"
f"Question: {user_query}"
)
π Environment-Specific Configuration
# environments/dev.tfvars
cpu_count = 1 # 1 vCPU (dev/sandbox only)
db_user = "app_user"
# environments/prod.tfvars
cpu_count = 8 # 8 vCPU for production workloads
db_user = "app_user"
Cost note: AlloyDB's 1 vCPU shape (C4A) is available for dev/test at ~$120/month. Production workloads should use 4+ vCPUs. This is more expensive than RAG Engine's Basic tier (~$50/month), but you get a full PostgreSQL database that serves both relational and vector workloads.
π AlloyDB vs RAG Engine
| Feature | AlloyDB AI | RAG Engine (Managed DB) |
|---|---|---|
| Query interface | SQL | Python SDK / REST |
| Combine with relational data | β Native | β |
| Embed in SQL | β
embedding()
|
β SDK only |
| Hybrid search | SQL WHERE + vector | Alpha-weighted |
| Index options | ScaNN, IVFFlat, HNSW | Managed |
| Min cost (dev) | ~$120/month | ~$50/month |
| Best for | SQL-native teams, mixed workloads | Quick RAG prototypes |
π‘ When to Choose AlloyDB AI
Pick AlloyDB AI when you need:
- SQL-native vector search alongside existing relational data
- Built-in embedding generation without a separate pipeline
- Filtered vector search with standard SQL WHERE clauses
- Joins between vector search results and relational tables
- Full PostgreSQL compatibility with your existing tooling
Stick with RAG Engine's managed DB when you want zero infrastructure and the fastest path to a working RAG prototype.
βοΈ What's Next
This is Post 3 of the GCP RAG Pipeline with Terraform series.
- Post 1: Vertex AI RAG Engine - Basic Setup π
- Post 2: Advanced RAG - Chunking, Hybrid Search, Reranking π§
- Post 3: AlloyDB AI / pgvector - SQL-Native Vector Search (you are here) π°
- Post 4: Auto-Sync Pipelines (next series)
Your RAG pipeline just gained SQL superpowers. AlloyDB AI lets you generate embeddings, search vectors, filter by metadata, and join with relational data - all in standard SQL. One database for everything. π°
Found this helpful? Follow for the full RAG Pipeline with Terraform series! π¬
Top comments (0)