DEV Community

Cover image for AlloyDB AI with pgvector for RAG: SQL-Native Vector Search on GCP with Terraform πŸ”Ž
Suhas Mallesh
Suhas Mallesh

Posted on

AlloyDB AI with pgvector for RAG: SQL-Native Vector Search on GCP with Terraform πŸ”Ž

RAG Engine's managed Spanner DB is simple but opaque. AlloyDB AI gives you pgvector with Google's ScaNN index, built-in Vertex AI integration, and full SQL control over your vectors. Here's how to set it up with Terraform for RAG workloads.

In RAG Post 1, we deployed a Vertex AI RAG Engine corpus backed by RagManagedDb, a fully managed Spanner-based vector store. It's zero-config, but you can't see your vectors, can't run SQL against them, and can't combine vector search with your existing relational data.

AlloyDB AI is GCP's alternative: a fully managed PostgreSQL-compatible database with pgvector for vector operations, Google's ScaNN index for high-performance similarity search, and built-in Vertex AI integration that lets you generate embeddings directly in SQL. If you already have relational data and want to add vector search without a separate system, AlloyDB is the answer. This post covers the Terraform setup and RAG patterns. 🎯

πŸ” Why AlloyDB AI for RAG?

The RAG Engine managed DB is great for getting started, but production teams often need more:

Capability RAG Engine (Managed DB) AlloyDB AI + pgvector
Vector search Managed, opaque Full SQL control
Combine with relational data ❌ Separate system βœ… Same database
Embedding generation SDK/API only SQL function (embedding())
Index type Managed ScaNN, IVFFlat, HNSW
Hybrid search Alpha-weighted SQL WHERE + vector ORDER BY
Infrastructure Zero config Cluster + instance to manage
Cost (dev) Basic tier ~$50/month 1 vCPU ~$120/month
Cost (prod) Scaled tier ~$200/month 4+ vCPU ~$400/month

The trade-off: AlloyDB costs more and requires more setup, but gives you full control, SQL-native operations, and the ability to combine vector search with your existing data in a single query.

πŸ”§ Terraform Setup

AlloyDB requires a VPC with private service access. Here's the full infrastructure:

Network and Private Service Access

# network/main.tf

resource "google_compute_network" "vpc" {
  name                    = "${var.environment}-${var.project}-vpc"
  auto_create_subnetworks = false
  project                 = var.project_id
}

resource "google_compute_global_address" "psa_range" {
  name          = "${var.environment}-psa-range"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 20
  network       = google_compute_network.vpc.id
  project       = var.project_id
}

resource "google_service_networking_connection" "psa" {
  network                 = google_compute_network.vpc.id
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.psa_range.name]
}
Enter fullscreen mode Exit fullscreen mode

AlloyDB Cluster and Instance

# alloydb/main.tf

resource "google_alloydb_cluster" "this" {
  cluster_id = "${var.environment}-${var.project}-alloydb"
  location   = var.region
  project    = var.project_id

  network_config {
    network = google_compute_network.vpc.id
  }

  initial_user {
    user     = var.db_user
    password = var.db_password
  }

  depends_on = [google_service_networking_connection.psa]
}

resource "google_alloydb_instance" "primary" {
  cluster       = google_alloydb_cluster.this.name
  instance_id   = "${var.environment}-primary"
  instance_type = "PRIMARY"

  machine_config {
    cpu_count = var.cpu_count
  }

  depends_on = [google_alloydb_cluster.this]
}
Enter fullscreen mode Exit fullscreen mode

IAM for Vertex AI Integration

AlloyDB needs permission to call Vertex AI embedding models. Grant the AlloyDB service agent the Vertex AI User role:

# alloydb/iam.tf

resource "google_project_iam_member" "alloydb_vertex_ai" {
  project = var.project_id
  role    = "roles/aiplatform.user"
  member  = "serviceAccount:service-${data.google_project.this.number}@gcp-sa-alloydb.iam.gserviceaccount.com"
}
Enter fullscreen mode Exit fullscreen mode

This is the critical step most people miss. Without it, the embedding() SQL function returns permission errors.

🧩 Database Setup: Extensions and Schema

After Terraform provisions the cluster, enable the required extensions and create your RAG schema. This is a one-time setup via SQL:

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE;
CREATE EXTENSION IF NOT EXISTS vector;

-- RAG documents table
CREATE TABLE rag_chunks (
    id          SERIAL PRIMARY KEY,
    source_uri  TEXT NOT NULL,
    title       TEXT,
    content     TEXT NOT NULL,
    department  TEXT,
    year        INTEGER,
    embedding   VECTOR(768),
    created_at  TIMESTAMP DEFAULT NOW()
);
Enter fullscreen mode Exit fullscreen mode

The google_ml_integration extension enables the embedding() function that calls Vertex AI models directly from SQL. The vector extension (pgvector) adds the VECTOR data type and similarity search operators.

🧠 Generate Embeddings in SQL

This is AlloyDB AI's killer feature - no Python SDK or API calls needed:

-- Generate embedding for a single chunk
UPDATE rag_chunks
SET embedding = embedding(
    'text-embedding-005',
    content
)::VECTOR
WHERE id = 1;

-- Batch generate for all chunks without embeddings
UPDATE rag_chunks
SET embedding = embedding(
    'text-embedding-005',
    content
)::VECTOR
WHERE embedding IS NULL;
Enter fullscreen mode Exit fullscreen mode

The embedding() function calls the Vertex AI text-embedding model and returns a vector directly. No external API calls, no Python glue code, no separate embedding pipeline. The embedding model must be available in your AlloyDB cluster's region.

πŸ” Vector Search: Three Patterns

Pattern 1: Pure Vector Search (Semantic)

SELECT title, content, source_uri,
       embedding <=> embedding('text-embedding-005', 'What is our return policy?')::VECTOR AS distance
FROM rag_chunks
ORDER BY distance
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

The <=> operator computes cosine distance. Lower distance = more similar.

Pattern 2: Filtered Vector Search (Hybrid)

This is where AlloyDB shines over managed vector stores - combine SQL WHERE clauses with vector search:

SELECT title, content, source_uri,
       embedding <=> embedding('text-embedding-005', 'overtime penalties')::VECTOR AS distance
FROM rag_chunks
WHERE department = 'legal' AND year >= 2024
ORDER BY distance
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

No separate metadata filtering system. Standard SQL WHERE clauses narrow the candidate set before vector search runs.

Pattern 3: Vector Search with Joins

Combine vector search with relational data across tables:

SELECT c.title, c.content, c.source_uri, d.last_updated, d.author
FROM rag_chunks c
JOIN documents d ON c.source_uri = d.uri
WHERE d.status = 'published'
ORDER BY c.embedding <=> embedding('text-embedding-005', 'refund policy changes')::VECTOR
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

This is impossible with RAG Engine's managed DB or standalone vector stores - you'd need to join data across two separate systems.

⚑ ScaNN Index for Production Performance

For tables with more than a few thousand rows, create a ScaNN index. Google's Scalable Nearest Neighbor algorithm provides significantly faster approximate nearest neighbor search:

-- Create ScaNN index for production performance
CREATE INDEX rag_chunks_embedding_idx
ON rag_chunks
USING scann (embedding cosine)
WITH (num_leaves = 100);
Enter fullscreen mode Exit fullscreen mode

Tuning: Set num_leaves to approximately the square root of your row count. For 10,000 chunks, use ~100 leaves. For 1,000,000 chunks, use ~1,000 leaves.

AlloyDB also supports standard pgvector indexes (IVFFlat and HNSW) if you prefer open-source compatibility.

πŸ”— Integrating with Gemini for RAG

Use the retrieved chunks as context for Gemini generation:

from vertexai.generative_models import GenerativeModel
import pg8000

# Retrieve relevant chunks from AlloyDB
conn = pg8000.connect(host=alloydb_ip, user="app_user", database="rag_db")
cursor = conn.cursor()

cursor.execute("""
    SELECT content FROM rag_chunks
    ORDER BY embedding <=> embedding('text-embedding-005', %s)::VECTOR
    LIMIT 5
""", (user_query,))

chunks = [row[0] for row in cursor.fetchall()]
context = "\n\n".join(chunks)

# Generate response with Gemini
model = GenerativeModel("gemini-2.0-flash")
response = model.generate_content(
    f"Based on the following context, answer the question.\n\n"
    f"Context:\n{context}\n\n"
    f"Question: {user_query}"
)
Enter fullscreen mode Exit fullscreen mode

πŸ“ Environment-Specific Configuration

# environments/dev.tfvars
cpu_count    = 1         # 1 vCPU (dev/sandbox only)
db_user      = "app_user"

# environments/prod.tfvars
cpu_count    = 8         # 8 vCPU for production workloads
db_user      = "app_user"
Enter fullscreen mode Exit fullscreen mode

Cost note: AlloyDB's 1 vCPU shape (C4A) is available for dev/test at ~$120/month. Production workloads should use 4+ vCPUs. This is more expensive than RAG Engine's Basic tier (~$50/month), but you get a full PostgreSQL database that serves both relational and vector workloads.

πŸ”„ AlloyDB vs RAG Engine

Feature AlloyDB AI RAG Engine (Managed DB)
Query interface SQL Python SDK / REST
Combine with relational data βœ… Native ❌
Embed in SQL βœ… embedding() ❌ SDK only
Hybrid search SQL WHERE + vector Alpha-weighted
Index options ScaNN, IVFFlat, HNSW Managed
Min cost (dev) ~$120/month ~$50/month
Best for SQL-native teams, mixed workloads Quick RAG prototypes

πŸ’‘ When to Choose AlloyDB AI

Pick AlloyDB AI when you need:

  • SQL-native vector search alongside existing relational data
  • Built-in embedding generation without a separate pipeline
  • Filtered vector search with standard SQL WHERE clauses
  • Joins between vector search results and relational tables
  • Full PostgreSQL compatibility with your existing tooling

Stick with RAG Engine's managed DB when you want zero infrastructure and the fastest path to a working RAG prototype.

⏭️ What's Next

This is Post 3 of the GCP RAG Pipeline with Terraform series.


Your RAG pipeline just gained SQL superpowers. AlloyDB AI lets you generate embeddings, search vectors, filter by metadata, and join with relational data - all in standard SQL. One database for everything. πŸ’°

Found this helpful? Follow for the full RAG Pipeline with Terraform series! πŸ’¬

Top comments (0)