Azure's RAG pattern connects Blob Storage, AI Search, and OpenAI in a single API call. Terraform provisions the search service and storage. The "On Your Data" API handles the rest.
You have an Azure OpenAI endpoint answering general questions. Ask it about your company's internal docs and it hallucinates. RAG fixes this by grounding model responses in your actual data.
Azure's RAG architecture connects three services: Blob Storage holds your documents, Azure AI Search indexes them with vector and keyword search, and Azure OpenAI's "On Your Data" feature ties it all together in a single API call. You don't build a retrieval pipeline - you tell the chat completions API where your search index is, and Azure handles intent extraction, retrieval, reranking, and generation.
Terraform provisions the infrastructure. The portal or SDK handles index creation and data ingestion. π―
ποΈ Architecture Overview
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
β Blob βββββ>β Azure AI βββββ>β Azure OpenAI β
β Storage β β Search β β "On Your Data" β
β (Documents) β β (Index) β β (Chat + Retrieve)β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
Data flow: Documents in Blob Storage are indexed by Azure AI Search with integrated vectorization (AI Search calls the embedding model for you). At query time, the chat completions API with a data source parameter handles the full RAG pipeline - intent extraction, search, reranking, and generation.
π¦ Step 1: Blob Storage for Documents
This container holds your source documents - PDFs, Word docs, plain text, HTML:
# rag/storage.tf
resource "azurerm_storage_account" "rag_docs" {
name = "${var.environment}ragdocs"
resource_group_name = var.resource_group_name
location = var.location
account_tier = "Standard"
account_replication_type = "LRS"
min_tls_version = "TLS1_2"
tags = {
Environment = var.environment
Purpose = "rag-document-source"
}
}
resource "azurerm_storage_container" "documents" {
name = "documents"
storage_account_id = azurerm_storage_account.rag_docs.id
container_access_type = "private"
}
π Step 2: Azure AI Search Service
The search service is the vector store and retrieval engine. Terraform provisions the service. Index creation and data ingestion happen via the portal or REST API:
# rag/search.tf
resource "azurerm_search_service" "rag" {
name = "${var.environment}-rag-search"
resource_group_name = var.resource_group_name
location = var.location
sku = var.search_sku
replica_count = var.environment == "prod" ? 2 : 1
partition_count = 1
semantic_search_sku = var.environment == "prod" ? "standard" : "free"
identity {
type = "SystemAssigned"
}
tags = {
Environment = var.environment
Purpose = "rag-vector-search"
}
}
SKU selection matters:
| SKU | Vector Search | Semantic Ranker | Storage | Best For |
|---|---|---|---|---|
| free | Yes (small) | Free tier | 50 MB | Prototyping |
| basic | Yes | Free tier | 2 GB | Dev/test |
| standard | Yes | Standard tier | 25 GB+ | Production |
Semantic ranker is Azure's key RAG quality differentiator. It reranks search results using a deep learning model, significantly improving retrieval relevance. The free tier allows 1,000 queries/month.
π€ Step 3: Embedding Model Deployment
You need an embedding model for vectorization alongside your chat model. Add this to your cognitive account from Post 1:
# rag/embedding.tf
resource "azurerm_cognitive_deployment" "embedding" {
name = "text-embedding-ada-002"
cognitive_account_id = var.cognitive_account_id
model {
format = "OpenAI"
name = "text-embedding-ada-002"
version = "2"
}
sku {
name = "Standard"
capacity = var.environment == "prod" ? 120 : 30
}
}
π Step 4: Role Assignments
The search service needs access to the storage account (to read documents) and the OpenAI account (to call the embedding model). Managed identity keeps this secure:
# rag/roles.tf
# AI Search reads documents from Blob Storage
resource "azurerm_role_assignment" "search_reads_storage" {
scope = azurerm_storage_account.rag_docs.id
role_definition_name = "Storage Blob Data Reader"
principal_id = azurerm_search_service.rag.identity[0].principal_id
}
# AI Search calls the embedding model
resource "azurerm_role_assignment" "search_calls_openai" {
scope = var.cognitive_account_id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_search_service.rag.identity[0].principal_id
}
# OpenAI reads from AI Search (for "On Your Data")
resource "azurerm_role_assignment" "openai_reads_search" {
scope = azurerm_search_service.rag.id
role_definition_name = "Search Index Data Reader"
principal_id = var.cognitive_account_principal_id
}
This triangle of permissions is the most common source of "On Your Data" failures. All three role assignments must be in place before querying.
π§ Step 5: Variables
# rag/variables.tf
variable "environment" { type = string }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "cognitive_account_id" { type = string }
variable "cognitive_account_principal_id" { type = string }
variable "search_sku" {
type = string
default = "basic"
}
Per-environment configs:
# environments/dev.tfvars
search_sku = "basic"
# environments/prod.tfvars
search_sku = "standard"
π Step 6: Create Index and Ingest Data
After Terraform deploys the infrastructure, create a search index and ingest documents. The easiest path is the Azure AI Foundry portal ("Add your data" flow), which handles chunking, embedding, and indexing automatically.
For automation, use the REST API or Python SDK:
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://YOUR_RESOURCE.openai.azure.com/",
api_key="YOUR_KEY",
api_version="2024-10-21"
)
The portal-based ingestion is recommended for initial setup because it creates the index schema, skillset, indexer, and data source in one step. Terraform handles what changes rarely (infrastructure). The portal handles what changes occasionally (index configuration and data refresh).
π Step 7: Query with "On Your Data"
The magic of Azure's approach is the "On Your Data" API. You add a data_sources parameter to your standard chat completions call:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is our refund policy?"}
],
extra_body={
"data_sources": [{
"type": "azure_search",
"parameters": {
"endpoint": "https://YOUR-SEARCH.search.windows.net",
"index_name": "YOUR_INDEX_NAME",
"authentication": {
"type": "system_assigned_managed_identity"
},
"query_type": "vector_semantic_hybrid",
"embedding_dependency": {
"type": "deployment_name",
"deployment_name": "text-embedding-ada-002"
},
"top_n_documents": 5
}
}]
}
)
print(response.choices[0].message.content)
# Citations included in response context
query_type: vector_semantic_hybrid is the recommended setting. It combines keyword matching, vector similarity, and semantic reranking for the best retrieval quality. Azure handles the entire pipeline - intent extraction from the user query, parallel search, reranking, and augmented generation.
π Production Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β Chat Completions API β
β + data_sources: [{ type: "azure_search" }] β
ββββββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β
Azure OpenAI "On Your Data"
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββββ ββββββββββββββββ
β Intent β β AI Search β β Response β
β Extract β β Hybrid β β Generation β
β β β + Semantic β β + Citations β
β β β Rerank β β β
βββββββββββ ββββββββββββββ ββββββββββββββββ
The "On Your Data" API is doing four things in one call: extracting search intent from the user query, running hybrid vector + keyword search with semantic reranking, filtering by relevance threshold, and generating a grounded response with citations.
Your first Azure RAG pipeline is deployed. Documents in Blob Storage, vectors in AI Search, retrieval via "On Your Data" - all infrastructure in Terraform, all queryable with a single API call. π
Found this helpful? Follow for the full RAG Pipeline with Terraform series! π¬
Top comments (0)