Suhas Mallesh

Posted on Mar 4

AlloyDB AI with pgvector for RAG: SQL-Native Vector Search on GCP with Terraform 🔎

#gcp #terraform #devops #ai

RAG Engine's managed Spanner DB is simple but opaque. AlloyDB AI gives you pgvector with Google's ScaNN index, built-in Vertex AI integration, and full SQL control over your vectors. Here's how to set it up with Terraform for RAG workloads.

In RAG Post 1, we deployed a Vertex AI RAG Engine corpus backed by RagManagedDb, a fully managed Spanner-based vector store. It's zero-config, but you can't see your vectors, can't run SQL against them, and can't combine vector search with your existing relational data.

AlloyDB AI is GCP's alternative: a fully managed PostgreSQL-compatible database with pgvector for vector operations, Google's ScaNN index for high-performance similarity search, and built-in Vertex AI integration that lets you generate embeddings directly in SQL. If you already have relational data and want to add vector search without a separate system, AlloyDB is the answer. This post covers the Terraform setup and RAG patterns. 🎯

🔍 Why AlloyDB AI for RAG?

The RAG Engine managed DB is great for getting started, but production teams often need more:

Capability	RAG Engine (Managed DB)	AlloyDB AI + pgvector
Vector search	Managed, opaque	Full SQL control
Combine with relational data	❌ Separate system	✅ Same database
Embedding generation	SDK/API only	SQL function (`embedding()`)
Index type	Managed	ScaNN, IVFFlat, HNSW
Hybrid search	Alpha-weighted	SQL WHERE + vector ORDER BY
Infrastructure	Zero config	Cluster + instance to manage
Cost (dev)	Basic tier ~$50/month	1 vCPU ~$120/month
Cost (prod)	Scaled tier ~$200/month	4+ vCPU ~$400/month

The trade-off: AlloyDB costs more and requires more setup, but gives you full control, SQL-native operations, and the ability to combine vector search with your existing data in a single query.

🔧 Terraform Setup

AlloyDB requires a VPC with private service access. Here's the full infrastructure:

Network and Private Service Access

# network/main.tf

resource "google_compute_network" "vpc" {
  name                    = "${var.environment}-${var.project}-vpc"
  auto_create_subnetworks = false
  project                 = var.project_id
}

resource "google_compute_global_address" "psa_range" {
  name          = "${var.environment}-psa-range"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 20
  network       = google_compute_network.vpc.id
  project       = var.project_id
}

resource "google_service_networking_connection" "psa" {
  network                 = google_compute_network.vpc.id
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.psa_range.name]
}

AlloyDB Cluster and Instance

# alloydb/main.tf

resource "google_alloydb_cluster" "this" {
  cluster_id = "${var.environment}-${var.project}-alloydb"
  location   = var.region
  project    = var.project_id

  network_config {
    network = google_compute_network.vpc.id
  }

  initial_user {
    user     = var.db_user
    password = var.db_password
  }

  depends_on = [google_service_networking_connection.psa]
}

resource "google_alloydb_instance" "primary" {
  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.environment}-primary"
  instance_type = "PRIMARY"

  machine_config {
    cpu_count = var.cpu_count
  }

  depends_on = [google_alloydb_cluster.this]
}

IAM for Vertex AI Integration

AlloyDB needs permission to call Vertex AI embedding models. Grant the AlloyDB service agent the Vertex AI User role:

# alloydb/iam.tf

resource "google_project_iam_member" "alloydb_vertex_ai" {
  project = var.project_id
  role    = "roles/aiplatform.user"
  member  = "serviceAccount:service-${data.google_project.this.number}@gcp-sa-alloydb.iam.gserviceaccount.com"
}

This is the critical step most people miss. Without it, the embedding() SQL function returns permission errors.

🧩 Database Setup: Extensions and Schema

After Terraform provisions the cluster, enable the required extensions and create your RAG schema. This is a one-time setup via SQL:

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE;
CREATE EXTENSION IF NOT EXISTS vector;

-- RAG documents table
CREATE TABLE rag_chunks (
    id          SERIAL PRIMARY KEY,
    source_uri  TEXT NOT NULL,
    title       TEXT,
    content     TEXT NOT NULL,
    department  TEXT,
    year        INTEGER,
    embedding   VECTOR(768),
    created_at  TIMESTAMP DEFAULT NOW()
);

The google_ml_integration extension enables the embedding() function that calls Vertex AI models directly from SQL. The vector extension (pgvector) adds the VECTOR data type and similarity search operators.

🧠 Generate Embeddings in SQL

This is AlloyDB AI's killer feature - no Python SDK or API calls needed:

-- Generate embedding for a single chunk
UPDATE rag_chunks
SET embedding = embedding(
    'text-embedding-005',
    content
)::VECTOR
WHERE id = 1;

-- Batch generate for all chunks without embeddings
UPDATE rag_chunks
SET embedding = embedding(
    'text-embedding-005',
    content
)::VECTOR
WHERE embedding IS NULL;

The embedding() function calls the Vertex AI text-embedding model and returns a vector directly. No external API calls, no Python glue code, no separate embedding pipeline. The embedding model must be available in your AlloyDB cluster's region.

🔍 Vector Search: Three Patterns

Pattern 1: Pure Vector Search (Semantic)

SELECT title, content, source_uri,
       embedding <=> embedding('text-embedding-005', 'What is our return policy?')::VECTOR AS distance
FROM rag_chunks
ORDER BY distance
LIMIT 5;

The <=> operator computes cosine distance. Lower distance = more similar.

Pattern 2: Filtered Vector Search (Hybrid)

This is where AlloyDB shines over managed vector stores - combine SQL WHERE clauses with vector search:

SELECT title, content, source_uri,
       embedding <=> embedding('text-embedding-005', 'overtime penalties')::VECTOR AS distance
FROM rag_chunks
WHERE department = 'legal' AND year >= 2024
ORDER BY distance
LIMIT 5;

No separate metadata filtering system. Standard SQL WHERE clauses narrow the candidate set before vector search runs.

Pattern 3: Vector Search with Joins

Combine vector search with relational data across tables:

SELECT c.title, c.content, c.source_uri, d.last_updated, d.author
FROM rag_chunks c
JOIN documents d ON c.source_uri = d.uri
WHERE d.status = 'published'
ORDER BY c.embedding <=> embedding('text-embedding-005', 'refund policy changes')::VECTOR
LIMIT 5;

This is impossible with RAG Engine's managed DB or standalone vector stores - you'd need to join data across two separate systems.

⚡ ScaNN Index for Production Performance

For tables with more than a few thousand rows, create a ScaNN index. Google's Scalable Nearest Neighbor algorithm provides significantly faster approximate nearest neighbor search:

-- Create ScaNN index for production performance
CREATE INDEX rag_chunks_embedding_idx
ON rag_chunks
USING scann (embedding cosine)
WITH (num_leaves = 100);

Tuning: Set num_leaves to approximately the square root of your row count. For 10,000 chunks, use ~100 leaves. For 1,000,000 chunks, use ~1,000 leaves.

AlloyDB also supports standard pgvector indexes (IVFFlat and HNSW) if you prefer open-source compatibility.

🔗 Integrating with Gemini for RAG

Use the retrieved chunks as context for Gemini generation:

from vertexai.generative_models import GenerativeModel
import pg8000

# Retrieve relevant chunks from AlloyDB
conn = pg8000.connect(host=alloydb_ip, user="app_user", database="rag_db")
cursor = conn.cursor()

cursor.execute("""
    SELECT content FROM rag_chunks
    ORDER BY embedding <=> embedding('text-embedding-005', %s)::VECTOR
    LIMIT 5
""", (user_query,))

chunks = [row[0] for row in cursor.fetchall()]
context = "\n\n".join(chunks)

# Generate response with Gemini
model = GenerativeModel("gemini-2.0-flash")
response = model.generate_content(
    f"Based on the following context, answer the question.\n\n"
    f"Context:\n{context}\n\n"
    f"Question: {user_query}"
)

📐 Environment-Specific Configuration

# environments/dev.tfvars
cpu_count    = 1         # 1 vCPU (dev/sandbox only)
db_user      = "app_user"

# environments/prod.tfvars
cpu_count    = 8         # 8 vCPU for production workloads
db_user      = "app_user"

Cost note: AlloyDB's 1 vCPU shape (C4A) is available for dev/test at ~$120/month. Production workloads should use 4+ vCPUs. This is more expensive than RAG Engine's Basic tier (~$50/month), but you get a full PostgreSQL database that serves both relational and vector workloads.

🔄 AlloyDB vs RAG Engine

Feature	AlloyDB AI	RAG Engine (Managed DB)
Query interface	SQL	Python SDK / REST
Combine with relational data	✅ Native	❌
Embed in SQL	✅ `embedding()`	❌ SDK only
Hybrid search	SQL WHERE + vector	Alpha-weighted
Index options	ScaNN, IVFFlat, HNSW	Managed
Min cost (dev)	~$120/month	~$50/month
Best for	SQL-native teams, mixed workloads	Quick RAG prototypes

💡 When to Choose AlloyDB AI

Pick AlloyDB AI when you need:

SQL-native vector search alongside existing relational data
Built-in embedding generation without a separate pipeline
Filtered vector search with standard SQL WHERE clauses
Joins between vector search results and relational tables
Full PostgreSQL compatibility with your existing tooling

Stick with RAG Engine's managed DB when you want zero infrastructure and the fastest path to a working RAG prototype.

⏭️ What's Next

This is Post 3 of the GCP RAG Pipeline with Terraform series.

Post 1: Vertex AI RAG Engine - Basic Setup 🔍
Post 2: Advanced RAG - Chunking, Hybrid Search, Reranking 🧠
Post 3: AlloyDB AI / pgvector - SQL-Native Vector Search (you are here) 💰
Post 4: Auto-Sync Pipelines (next series)

Your RAG pipeline just gained SQL superpowers. AlloyDB AI lets you generate embeddings, search vectors, filter by metadata, and join with relational data - all in standard SQL. One database for everything. 💰

Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬

DEV Community