DEV Community

Cover image for Azure AI Search RAG with Terraform: Your First RAG Pipeline on Azure πŸ”
Suhas Mallesh
Suhas Mallesh

Posted on

Azure AI Search RAG with Terraform: Your First RAG Pipeline on Azure πŸ”

Azure's RAG pattern connects Blob Storage, AI Search, and OpenAI in a single API call. Terraform provisions the search service and storage. The "On Your Data" API handles the rest.

You have an Azure OpenAI endpoint answering general questions. Ask it about your company's internal docs and it hallucinates. RAG fixes this by grounding model responses in your actual data.

Azure's RAG architecture connects three services: Blob Storage holds your documents, Azure AI Search indexes them with vector and keyword search, and Azure OpenAI's "On Your Data" feature ties it all together in a single API call. You don't build a retrieval pipeline - you tell the chat completions API where your search index is, and Azure handles intent extraction, retrieval, reranking, and generation.

Terraform provisions the infrastructure. The portal or SDK handles index creation and data ingestion. 🎯

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Blob        │────>β”‚  Azure AI    │────>β”‚ Azure OpenAI     β”‚
β”‚  Storage     β”‚     β”‚  Search      β”‚     β”‚ "On Your Data"   β”‚
β”‚  (Documents) β”‚     β”‚  (Index)     β”‚     β”‚ (Chat + Retrieve)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Data flow: Documents in Blob Storage are indexed by Azure AI Search with integrated vectorization (AI Search calls the embedding model for you). At query time, the chat completions API with a data source parameter handles the full RAG pipeline - intent extraction, search, reranking, and generation.

πŸ“¦ Step 1: Blob Storage for Documents

This container holds your source documents - PDFs, Word docs, plain text, HTML:

# rag/storage.tf

resource "azurerm_storage_account" "rag_docs" {
  name                     = "${var.environment}ragdocs"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"

  tags = {
    Environment = var.environment
    Purpose     = "rag-document-source"
  }
}

resource "azurerm_storage_container" "documents" {
  name                  = "documents"
  storage_account_id    = azurerm_storage_account.rag_docs.id
  container_access_type = "private"
}
Enter fullscreen mode Exit fullscreen mode

πŸ” Step 2: Azure AI Search Service

The search service is the vector store and retrieval engine. Terraform provisions the service. Index creation and data ingestion happen via the portal or REST API:

# rag/search.tf

resource "azurerm_search_service" "rag" {
  name                = "${var.environment}-rag-search"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = var.search_sku

  replica_count   = var.environment == "prod" ? 2 : 1
  partition_count = 1

  semantic_search_sku = var.environment == "prod" ? "standard" : "free"

  identity {
    type = "SystemAssigned"
  }

  tags = {
    Environment = var.environment
    Purpose     = "rag-vector-search"
  }
}
Enter fullscreen mode Exit fullscreen mode

SKU selection matters:

SKU Vector Search Semantic Ranker Storage Best For
free Yes (small) Free tier 50 MB Prototyping
basic Yes Free tier 2 GB Dev/test
standard Yes Standard tier 25 GB+ Production

Semantic ranker is Azure's key RAG quality differentiator. It reranks search results using a deep learning model, significantly improving retrieval relevance. The free tier allows 1,000 queries/month.

πŸ€– Step 3: Embedding Model Deployment

You need an embedding model for vectorization alongside your chat model. Add this to your cognitive account from Post 1:

# rag/embedding.tf

resource "azurerm_cognitive_deployment" "embedding" {
  name                 = "text-embedding-ada-002"
  cognitive_account_id = var.cognitive_account_id

  model {
    format  = "OpenAI"
    name    = "text-embedding-ada-002"
    version = "2"
  }

  sku {
    name     = "Standard"
    capacity = var.environment == "prod" ? 120 : 30
  }
}
Enter fullscreen mode Exit fullscreen mode

πŸ”‘ Step 4: Role Assignments

The search service needs access to the storage account (to read documents) and the OpenAI account (to call the embedding model). Managed identity keeps this secure:

# rag/roles.tf

# AI Search reads documents from Blob Storage
resource "azurerm_role_assignment" "search_reads_storage" {
  scope                = azurerm_storage_account.rag_docs.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = azurerm_search_service.rag.identity[0].principal_id
}

# AI Search calls the embedding model
resource "azurerm_role_assignment" "search_calls_openai" {
  scope                = var.cognitive_account_id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_search_service.rag.identity[0].principal_id
}

# OpenAI reads from AI Search (for "On Your Data")
resource "azurerm_role_assignment" "openai_reads_search" {
  scope                = azurerm_search_service.rag.id
  role_definition_name = "Search Index Data Reader"
  principal_id         = var.cognitive_account_principal_id
}
Enter fullscreen mode Exit fullscreen mode

This triangle of permissions is the most common source of "On Your Data" failures. All three role assignments must be in place before querying.

πŸ”§ Step 5: Variables

# rag/variables.tf

variable "environment" { type = string }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "cognitive_account_id" { type = string }
variable "cognitive_account_principal_id" { type = string }

variable "search_sku" {
  type    = string
  default = "basic"
}
Enter fullscreen mode Exit fullscreen mode

Per-environment configs:

# environments/dev.tfvars
search_sku = "basic"

# environments/prod.tfvars
search_sku = "standard"
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Step 6: Create Index and Ingest Data

After Terraform deploys the infrastructure, create a search index and ingest documents. The easiest path is the Azure AI Foundry portal ("Add your data" flow), which handles chunking, embedding, and indexing automatically.

For automation, use the REST API or Python SDK:

from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://YOUR_RESOURCE.openai.azure.com/",
    api_key="YOUR_KEY",
    api_version="2024-10-21"
)
Enter fullscreen mode Exit fullscreen mode

The portal-based ingestion is recommended for initial setup because it creates the index schema, skillset, indexer, and data source in one step. Terraform handles what changes rarely (infrastructure). The portal handles what changes occasionally (index configuration and data refresh).

πŸ” Step 7: Query with "On Your Data"

The magic of Azure's approach is the "On Your Data" API. You add a data_sources parameter to your standard chat completions call:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is our refund policy?"}
    ],
    extra_body={
        "data_sources": [{
            "type": "azure_search",
            "parameters": {
                "endpoint": "https://YOUR-SEARCH.search.windows.net",
                "index_name": "YOUR_INDEX_NAME",
                "authentication": {
                    "type": "system_assigned_managed_identity"
                },
                "query_type": "vector_semantic_hybrid",
                "embedding_dependency": {
                    "type": "deployment_name",
                    "deployment_name": "text-embedding-ada-002"
                },
                "top_n_documents": 5
            }
        }]
    }
)

print(response.choices[0].message.content)
# Citations included in response context
Enter fullscreen mode Exit fullscreen mode

query_type: vector_semantic_hybrid is the recommended setting. It combines keyword matching, vector similarity, and semantic reranking for the best retrieval quality. Azure handles the entire pipeline - intent extraction from the user query, parallel search, reranking, and augmented generation.

πŸ“ Production Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Chat Completions API                          β”‚
β”‚  + data_sources: [{ type: "azure_search" }]    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
        Azure OpenAI "On Your Data"
                     β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                β”‚                β”‚
    β–Ό                β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Intent  β”‚  β”‚ AI Search  β”‚  β”‚ Response     β”‚
β”‚ Extract β”‚  β”‚ Hybrid     β”‚  β”‚ Generation   β”‚
β”‚         β”‚  β”‚ + Semantic β”‚  β”‚ + Citations  β”‚
β”‚         β”‚  β”‚ Rerank     β”‚  β”‚              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

The "On Your Data" API is doing four things in one call: extracting search intent from the user query, running hybrid vector + keyword search with semantic reranking, filtering by relevance threshold, and generating a grounded response with citations.


Your first Azure RAG pipeline is deployed. Documents in Blob Storage, vectors in AI Search, retrieval via "On Your Data" - all infrastructure in Terraform, all queryable with a single API call. πŸ”

Found this helpful? Follow for the full RAG Pipeline with Terraform series! πŸ’¬

Top comments (0)