Azure AI Search is powerful but adds a separate service to manage. Cosmos DB for NoSQL has built-in vector search powered by Microsoft Research's DiskANN - store vectors and data together with sub-20ms latency. Here's the Terraform setup for RAG workloads.
In RAG Post 1, we deployed an Azure AI Search index as the vector store for our RAG pipeline. AI Search gives you a full retrieval stack with semantic ranking, hybrid search, and query decomposition. But it's a separate service with its own pricing tier, and for many RAG workloads, that's more infrastructure than you need.
Azure Cosmos DB for NoSQL now includes built-in vector search powered by DiskANN, a high-performance vector indexing algorithm developed by Microsoft Research. You store vectors alongside your documents in the same database, query them with the familiar NoSQL SQL syntax, and get sub-20ms latency at scale. If you already have data in Cosmos DB or want a single database for both operational and vector workloads, this eliminates an entire service from your architecture. π―
π Why Cosmos DB for RAG?
AI Search is the right choice when you need the full three-layer retrieval pipeline (hybrid search, semantic ranking, agentic retrieval). But many RAG use cases don't need all of that:
| Capability | Azure AI Search | Cosmos DB for NoSQL |
|---|---|---|
| Vector search | β Separate index | β Same container as data |
| Hybrid search | β BM25 + vector via RRF | β FullTextScore + VectorDistance via RRF |
| Semantic ranking | β Cross-encoder reranker | β (preview: semantic reranking) |
| Query decomposition | β Agentic retrieval | β Application-level |
| Index type | HNSW | DiskANN, quantizedFlat, flat |
| Latency | 50-200ms | < 20ms (DiskANN at 10M vectors) |
| Pricing model | Per-tier (Basic/Standard) | Pay-per-RU (serverless or provisioned) |
| Min cost (dev) | ~$75/month (Basic) | ~$0 (serverless, pay per request) |
The trade-off: Cosmos DB gives you lower latency and true pay-per-use pricing, but you lose AI Search's built-in semantic ranker and agentic retrieval. For straightforward RAG against your own documents, Cosmos DB's native vector search is often enough.
π§ Terraform Setup
Cosmos DB Account with Vector Search
# cosmosdb/main.tf
resource "azurerm_cosmosdb_account" "this" {
name = "${var.environment}-${var.project}-cosmos"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
offer_type = "Standard"
kind = "GlobalDocumentDB"
capabilities {
name = "EnableNoSQLVectorSearch"
}
capabilities {
name = "EnableServerless"
}
consistency_policy {
consistency_level = "Session"
}
geo_location {
location = var.location
failover_priority = 0
}
}
The EnableNoSQLVectorSearch capability activates DiskANN-based vector indexing. The EnableServerless capability sets pay-per-request pricing with no minimum cost - ideal for dev/test RAG workloads.
Database and Container
# cosmosdb/database.tf
resource "azurerm_cosmosdb_sql_database" "rag" {
name = "rag-database"
resource_group_name = azurerm_resource_group.this.name
account_name = azurerm_cosmosdb_account.this.name
}
resource "azurerm_cosmosdb_sql_container" "chunks" {
name = "rag-chunks"
resource_group_name = azurerm_resource_group.this.name
account_name = azurerm_cosmosdb_account.this.name
database_name = azurerm_cosmosdb_sql_database.rag.name
partition_key_paths = ["/department"]
indexing_policy {
indexing_mode = "consistent"
included_path {
path = "/*"
}
excluded_path {
path = "/contentVector/*"
}
}
}
Important: Exclude vector paths from the general index (/contentVector/*). Vectors get their own specialized DiskANN index - including them in the general index wastes RUs and storage.
Vector and Full-Text Policies via AzAPI
Terraform's azurerm provider doesn't yet support vector embedding policies or vector indexes directly on the container resource. Use the azapi provider to set these policies:
# cosmosdb/vector_policy.tf
resource "azapi_update_resource" "vector_policy" {
type = "Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2024-05-15"
parent_id = azurerm_cosmosdb_sql_container.chunks.id
body = jsonencode({
properties = {
resource = {
vectorEmbeddingPolicy = {
vectorEmbeddings = [{
path = "/contentVector"
dataType = "float32"
distanceFunction = "cosine"
dimensions = var.embedding_dimensions
}]
}
indexingPolicy = {
indexingMode = "consistent"
automatic = true
includedPaths = [{ path = "/*" }]
excludedPaths = [
{ path = "/contentVector/*" },
{ path = "/_etag/?" }
]
vectorIndexes = [{
path = "/contentVector"
type = var.vector_index_type
}]
fullTextIndexes = [{
path = "/content"
}]
}
fullTextPolicy = {
defaultLanguage = "en-US"
fullTextPaths = [{
path = "/content"
language = "en-US"
}]
}
}
}
})
}
This configures three things at once: the vector embedding policy (dimensions, distance function, data type), the DiskANN vector index, and the full-text index for hybrid search.
π Document Structure
Each document in your container stores the chunk content, metadata, and its vector embedding together:
{
"id": "chunk-001",
"sourceUri": "policies/returns-2024.pdf",
"title": "Return Policy - Section 3",
"content": "Customers may return items within 30 days...",
"department": "legal",
"year": 2024,
"contentVector": [0.023, -0.041, 0.089, ...]
}
No separate vector store, no sync pipeline, no metadata mapping. The vector lives alongside the data it represents.
π Vector Search: Three Patterns
Pattern 1: Pure Vector Search
SELECT TOP 10 c.title, c.content, c.sourceUri,
VectorDistance(c.contentVector, @queryVector) AS score
FROM c
ORDER BY VectorDistance(c.contentVector, @queryVector)
VectorDistance computes cosine similarity using the DiskANN index. Always use TOP N to avoid scanning the entire index.
Pattern 2: Filtered Vector Search
Combine standard WHERE clauses with vector search:
SELECT TOP 10 c.title, c.content, c.sourceUri,
VectorDistance(c.contentVector, @queryVector) AS score
FROM c
WHERE c.department = "legal" AND c.year >= 2024
ORDER BY VectorDistance(c.contentVector, @queryVector)
Cosmos DB's Sharded DiskANN optimizes filtered vector search by partitioning indexes per shard key. If you frequently filter by department, use it as the partition key for best performance.
Pattern 3: Hybrid Search (Vector + Full-Text)
Combine vector similarity with BM25 keyword scoring using Reciprocal Rank Fusion:
SELECT TOP 10 c.title, c.content, c.sourceUri
FROM c
ORDER BY RANK RRF(
VectorDistance(c.contentVector, @queryVector),
FullTextScore(c.content, ["return", "policy", "30 days"])
)
RRF merges the rankings from both search methods, catching results that are semantically similar and/or contain exact keywords. This is the same fusion approach AI Search uses internally.
β‘ DiskANN Index Types
| Index Type | Accuracy | Latency | RU Cost | Best For |
|---|---|---|---|---|
| flat | 100% (exact) | Higher | Higher | < 1,000 vectors, testing |
| quantizedFlat | ~95% | Medium | Medium | 1K-100K vectors |
| diskANN | ~95% | Lowest | Lowest | 100K+ vectors, production |
Note: Both quantizedFlat and diskANN require at least 1,000 vectors for accurate quantization. Below that threshold, Cosmos DB falls back to a full scan with higher RU charges.
DiskANN tuning parameters:
-
quantizationByteSize(1-512): Higher = better accuracy, higher RU cost -
indexingSearchListSize(10-500): Higher = better accuracy, slower index builds
For most RAG workloads, the defaults work well. Only tune if you measure recall issues.
π Integrating with Azure OpenAI for RAG
from azure.cosmos import CosmosClient
from openai import AzureOpenAI
# Generate query embedding
openai_client = AzureOpenAI(...)
query_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=user_query
).data[0].embedding
# Vector search in Cosmos DB
cosmos_client = CosmosClient(endpoint, credential)
container = cosmos_client.get_database_client("rag-database") \
.get_container_client("rag-chunks")
results = container.query_items(
query="""
SELECT TOP 5 c.title, c.content, c.sourceUri
FROM c
ORDER BY VectorDistance(c.contentVector, @vector)
""",
parameters=[{"name": "@vector", "value": query_embedding}],
enable_cross_partition_query=True
)
chunks = [item["content"] for item in results]
# Generate response with Azure OpenAI
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Answer based on context:\n{chr(10).join(chunks)}"},
{"role": "user", "content": user_query}
]
)
π Environment-Specific Configuration
# environments/dev.tfvars
cosmos_serverless = true # Pay per request, no minimum
vector_index_type = "quantizedFlat" # Good for < 100K vectors
embedding_dimensions = 1536
# environments/prod.tfvars
cosmos_serverless = false # Provisioned throughput
vector_index_type = "diskANN" # Production scale
embedding_dimensions = 1536
provisioned_throughput = 4000 # RU/s, autoscale to 40,000
Cost comparison: Serverless Cosmos DB charges per RU consumed with no minimum. A typical RAG query costs 10-50 RUs depending on vector dimensions and result count. At 10,000 queries/month, that's roughly $0.50-$2.50 total - compared to ~$75/month for AI Search Basic tier.
π‘ When to Choose Cosmos DB Vector Search
Pick Cosmos DB when you need:
- Zero minimum cost for dev/test with serverless mode
- Sub-20ms vector queries at production scale
- Data and vectors together in the same document
- Hybrid search with vector + full-text + filters in one query
- Global distribution with multi-region replication
Stick with AI Search when you need the semantic ranker, agentic retrieval, or the full three-layer pipeline from our Advanced RAG post.
βοΈ What's Next
This is Post 3 of the Azure RAG Pipeline with Terraform series.
- Post 1: Azure AI Search RAG - Basic Setup π
- Post 2: Advanced RAG - Three-Layer Retrieval π§
- Post 3: Cosmos DB Vector Search - NoSQL-Native RAG (you are here) π°
- Post 4: Auto-Sync Pipelines (next series)
Your RAG pipeline just went serverless. Cosmos DB's built-in DiskANN vector search gives you sub-20ms queries, hybrid search with RRF, and true pay-per-use pricing - all without a separate vector store. One database, one bill, one less service to manage. π°
Found this helpful? Follow for the full RAG Pipeline with Terraform series! π¬
Top comments (0)