Suhas Mallesh

Posted on Feb 25

Azure AI Search RAG with Terraform: Your First RAG Pipeline on Azure 🔍

#azure #terraform #devops #ai

Azure's RAG pattern connects Blob Storage, AI Search, and OpenAI in a single API call. Terraform provisions the search service and storage. The "On Your Data" API handles the rest.

You have an Azure OpenAI endpoint answering general questions. Ask it about your company's internal docs and it hallucinates. RAG fixes this by grounding model responses in your actual data.

Azure's RAG architecture connects three services: Blob Storage holds your documents, Azure AI Search indexes them with vector and keyword search, and Azure OpenAI's "On Your Data" feature ties it all together in a single API call. You don't build a retrieval pipeline - you tell the chat completions API where your search index is, and Azure handles intent extraction, retrieval, reranking, and generation.

Terraform provisions the infrastructure. The portal or SDK handles index creation and data ingestion. 🎯

🏗️ Architecture Overview

┌──────────────┐     ┌──────────────┐     ┌──────────────────┐
│  Blob        │────>│  Azure AI    │────>│ Azure OpenAI     │
│  Storage     │     │  Search      │     │ "On Your Data"   │
│  (Documents) │     │  (Index)     │     │ (Chat + Retrieve)│
└──────────────┘     └──────────────┘     └──────────────────┘

Data flow: Documents in Blob Storage are indexed by Azure AI Search with integrated vectorization (AI Search calls the embedding model for you). At query time, the chat completions API with a data source parameter handles the full RAG pipeline - intent extraction, search, reranking, and generation.

📦 Step 1: Blob Storage for Documents

This container holds your source documents - PDFs, Word docs, plain text, HTML:

# rag/storage.tf

resource "azurerm_storage_account" "rag_docs" {
  name                     = "${var.environment}ragdocs"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"

  tags = {
    Environment = var.environment
    Purpose     = "rag-document-source"
  }
}

resource "azurerm_storage_container" "documents" {
  name                  = "documents"
  storage_account_id    = azurerm_storage_account.rag_docs.id
  container_access_type = "private"
}

🔍 Step 2: Azure AI Search Service

The search service is the vector store and retrieval engine. Terraform provisions the service. Index creation and data ingestion happen via the portal or REST API:

# rag/search.tf

resource "azurerm_search_service" "rag" {
  name                = "${var.environment}-rag-search"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = var.search_sku

  replica_count   = var.environment == "prod" ? 2 : 1
  partition_count = 1

  semantic_search_sku = var.environment == "prod" ? "standard" : "free"

  identity {
    type = "SystemAssigned"
  }

  tags = {
    Environment = var.environment
    Purpose     = "rag-vector-search"
  }
}

SKU selection matters:

SKU	Vector Search	Semantic Ranker	Storage	Best For
free	Yes (small)	Free tier	50 MB	Prototyping
basic	Yes	Free tier	2 GB	Dev/test
standard	Yes	Standard tier	25 GB+	Production

Semantic ranker is Azure's key RAG quality differentiator. It reranks search results using a deep learning model, significantly improving retrieval relevance. The free tier allows 1,000 queries/month.

🤖 Step 3: Embedding Model Deployment

You need an embedding model for vectorization alongside your chat model. Add this to your cognitive account from Post 1:

# rag/embedding.tf

resource "azurerm_cognitive_deployment" "embedding" {
  name                 = "text-embedding-ada-002"
  cognitive_account_id = var.cognitive_account_id

  model {
    format  = "OpenAI"
    name    = "text-embedding-ada-002"
    version = "2"
  }

  sku {
    name     = "Standard"
    capacity = var.environment == "prod" ? 120 : 30
  }
}

🔑 Step 4: Role Assignments

The search service needs access to the storage account (to read documents) and the OpenAI account (to call the embedding model). Managed identity keeps this secure:

# rag/roles.tf

# AI Search reads documents from Blob Storage
resource "azurerm_role_assignment" "search_reads_storage" {
  scope                = azurerm_storage_account.rag_docs.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = azurerm_search_service.rag.identity[0].principal_id
}

# AI Search calls the embedding model
resource "azurerm_role_assignment" "search_calls_openai" {
  scope                = var.cognitive_account_id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_search_service.rag.identity[0].principal_id
}

# OpenAI reads from AI Search (for "On Your Data")
resource "azurerm_role_assignment" "openai_reads_search" {
  scope                = azurerm_search_service.rag.id
  role_definition_name = "Search Index Data Reader"
  principal_id         = var.cognitive_account_principal_id
}

This triangle of permissions is the most common source of "On Your Data" failures. All three role assignments must be in place before querying.

🔧 Step 5: Variables

# rag/variables.tf

variable "environment" { type = string }
variable "location" { type = string }
variable "resource_group_name" { type = string }
variable "cognitive_account_id" { type = string }
variable "cognitive_account_principal_id" { type = string }

variable "search_sku" {
  type    = string
  default = "basic"
}

Per-environment configs:

# environments/dev.tfvars
search_sku = "basic"

# environments/prod.tfvars
search_sku = "standard"

📊 Step 6: Create Index and Ingest Data

After Terraform deploys the infrastructure, create a search index and ingest documents. The easiest path is the Azure AI Foundry portal ("Add your data" flow), which handles chunking, embedding, and indexing automatically.

For automation, use the REST API or Python SDK:

from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://YOUR_RESOURCE.openai.azure.com/",
    api_key="YOUR_KEY",
    api_version="2024-10-21"
)

The portal-based ingestion is recommended for initial setup because it creates the index schema, skillset, indexer, and data source in one step. Terraform handles what changes rarely (infrastructure). The portal handles what changes occasionally (index configuration and data refresh).

🔍 Step 7: Query with "On Your Data"

The magic of Azure's approach is the "On Your Data" API. You add a data_sources parameter to your standard chat completions call:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is our refund policy?"}
    ],
    extra_body={
        "data_sources": [{
            "type": "azure_search",
            "parameters": {
                "endpoint": "https://YOUR-SEARCH.search.windows.net",
                "index_name": "YOUR_INDEX_NAME",
                "authentication": {
                    "type": "system_assigned_managed_identity"
                },
                "query_type": "vector_semantic_hybrid",
                "embedding_dependency": {
                    "type": "deployment_name",
                    "deployment_name": "text-embedding-ada-002"
                },
                "top_n_documents": 5
            }
        }]
    }
)

print(response.choices[0].message.content)
# Citations included in response context

query_type: vector_semantic_hybrid is the recommended setting. It combines keyword matching, vector similarity, and semantic reranking for the best retrieval quality. Azure handles the entire pipeline - intent extraction from the user query, parallel search, reranking, and augmented generation.

📐 Production Architecture

┌────────────────────────────────────────────────┐
│  Chat Completions API                          │
│  + data_sources: [{ type: "azure_search" }]    │
└────────────────────┬───────────────────────────┘
                     │
        Azure OpenAI "On Your Data"
                     │
    ┌────────────────┼────────────────┐
    │                │                │
    ▼                ▼                ▼
┌─────────┐  ┌────────────┐  ┌──────────────┐
│ Intent  │  │ AI Search  │  │ Response     │
│ Extract │  │ Hybrid     │  │ Generation   │
│         │  │ + Semantic │  │ + Citations  │
│         │  │ Rerank     │  │              │
└─────────┘  └────────────┘  └──────────────┘

The "On Your Data" API is doing four things in one call: extracting search intent from the user query, running hybrid vector + keyword search with semantic reranking, filtering by relevance threshold, and generating a grounded response with citations.

Your first Azure RAG pipeline is deployed. Documents in Blob Storage, vectors in AI Search, retrieval via "On Your Data" - all infrastructure in Terraform, all queryable with a single API call. 🔍

Found this helpful? Follow for the full RAG Pipeline with Terraform series! 💬

DEV Community