Suhas Mallesh

Posted on Feb 17

Your Azure Account is AI-Ready: Deploy your first AI endpoint with Terraform in 10 minutes⚡

#azure #terraform #devops #ai

Deploy GPT-5 on Azure AI Foundry with Terraform: Swap Models in One Line, Not One Sprint 🧠

Azure gives you GPT-5, GPT-5-mini, GPT-5-nano, and the full OpenAI catalog on your infrastructure. Here's how to deploy it with Terraform so upgrading to the next model is a one-variable change.

OpenAI drops new models every few months. GPT-4o → GPT-4.1 → GPT-5 — each one smarter and cheaper per token.

Here's the problem: most teams hardcode model names everywhere. When a new model drops, it's a sprint to update infrastructure, deployments, environment variables, and application code across every environment.

What if upgrading to GPT-5.2 (or whatever comes next) was a one-variable change?

That's what we're building today — an Azure AI Foundry setup with Terraform where models are variables, not hardcoded strings. When the next model drops, you change one .tfvars file and run terraform apply. Done. 🎯

🤔 Azure AI Foundry vs Bedrock vs Vertex AI

	Azure AI Foundry	Bedrock (AWS)	Vertex AI (GCP)
What it is	Managed OpenAI + multi-model platform	API access to multiple model providers	Full AI platform (inference + training)
Models	GPT-5, GPT-5-mini/nano, o4-mini	Claude, Llama, Titan	Gemini 3, 2.5, Llama, Claude
Unique strength	OpenAI API compatibility + Azure AD	Serverless Knowledge Bases & Agents	Native BigQuery integration
Key advantage	Drop-in replacement for OpenAI API	Broadest model selection	Most complete ML platform

Azure's edge: If your team already uses the OpenAI Python SDK, switching to Azure is a 3-line code change. Same API, just pointed at your Azure endpoint. 🎯

💰 Azure AI Model Landscape (As of February 2026)

Before writing any Terraform, know what you're choosing from:

Model	Best For	Input $/1M tokens	Output $/1M tokens	Context Window	Speed
GPT-5	Flagship: reasoning, coding, agentic	$1.25	$10.00	400K	Fast
GPT-5 Mini	80% of GPT-5 at 20% cost	$0.25	$2.00	400K	Faster
GPT-5 Nano	Ultra-cheap, classification, extraction	$0.05	$0.40	400K	Fastest
GPT-5.2	Latest frontier (preview/limited)	$1.75	$14.00	400K	Moderate
GPT-4.1	Previous gen (still excellent)	$2.00	$8.00	1M	Fast
GPT-4.1 Mini	Previous gen balanced	$0.40	$1.60	1M	Faster
o4-mini	Deep reasoning, math	$1.10	$4.40	200K	Moderate
text-embedding-3-small	Embeddings for RAG	$0.02	—	8K	Fastest

Key insight: GPT-5-nano ($0.05/1M) is 25x cheaper than GPT-5 ($1.25/1M). Use nano for dev, GPT-5 for prod. When GPT-5.2 goes GA, just update one variable. 💸

💡 Plot twist: GPT-5 ($1.25/1M input) is actually cheaper than its predecessor GPT-4.1 ($2.00/1M input) while being significantly more capable. Upgrading both saves money and gets better results.

🏗️ Step 1: Model Configuration as Variables

This is the key to the entire setup — every model detail is a variable:

# ai-foundry/variables.tf

variable "location" {
  type    = string
  default = "eastus2"
}

variable "environment" {
  type    = string
  default = "dev"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Must be: dev, staging, or prod."
  }
}

# ─── MODEL CONFIGURATION (Change ONLY these to upgrade) ───

variable "primary_model" {
  description = "Primary model for chat/completion. Change this when a new model releases."
  type = object({
    name    = string
    version = string
    sku     = string
    tpm     = number  # Tokens per minute (in thousands)
  })
}

variable "economy_model" {
  description = "Cheap model for dev/testing/simple tasks. Usually the 'nano' or 'mini' variant."
  type = object({
    name    = string
    version = string
    sku     = string
    tpm     = number
  })
}

variable "embedding_model" {
  description = "Embedding model for RAG/vector search."
  type = object({
    name    = string
    version = string
    sku     = string
    tpm     = number
  })
}

Now create per-environment .tfvars files — this is where the magic happens:

# environments/dev.tfvars
# ─── Dev: Cheapest models, low throughput ───

environment = "dev"

primary_model = {
  name    = "gpt-5-nano"       # $0.05/1M input — dirt cheap
  version = "2025-08-07"
  sku     = "GlobalStandard"
  tpm     = 10
}

economy_model = {
  name    = "gpt-5-nano"       # Same as primary in dev
  version = "2025-08-07"
  sku     = "GlobalStandard"
  tpm     = 10
}

embedding_model = {
  name    = "text-embedding-3-small"
  version = "1"
  sku     = "Standard"
  tpm     = 50
}

# environments/prod.tfvars
# ─── Prod: Latest flagship, high throughput ───

environment = "prod"

primary_model = {
  name    = "gpt-5"            # $1.25/1M input — latest flagship
  version = "2025-08-07"
  sku     = "GlobalStandard"
  tpm     = 80
}

economy_model = {
  name    = "gpt-5-mini"       # $0.25/1M — fallback for simple tasks
  version = "2025-08-07"
  sku     = "GlobalStandard"
  tpm     = 120
}

embedding_model = {
  name    = "text-embedding-3-small"
  version = "1"
  sku     = "Standard"
  tpm     = 350
}

🚀 When GPT-5.2 goes GA: Update prod.tfvars with name = "gpt-5.2" and the new version. Run terraform apply. That's it. No code changes. No sprint planning. One PR, one review, one deploy.

🏗️ Step 2: AI Foundry Resource

# ai-foundry/main.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 4.0.0"
    }
  }
}

provider "azurerm" {
  features {
    cognitive_account {
      purge_soft_delete_on_destroy = true
    }
  }
}

resource "random_string" "suffix" {
  length  = 6
  special = false
  upper   = false
}

# Resource Group
resource "azurerm_resource_group" "ai" {
  name     = "rg-${var.environment}-ai-foundry"
  location = var.location

  tags = local.tags
}

# AI Foundry (Cognitive Services Account)
resource "azurerm_cognitive_account" "ai_foundry" {
  name                  = "${var.environment}-ai-foundry"
  location              = azurerm_resource_group.ai.location
  resource_group_name   = azurerm_resource_group.ai.name
  kind                  = "OpenAI"
  sku_name              = "S0"
  custom_subdomain_name = "${var.environment}-ai-foundry-${random_string.suffix.result}"

  local_auth_enabled = var.environment == "prod" ? false : true

  identity {
    type = "SystemAssigned"
  }

  network_acls {
    default_action = var.environment == "prod" ? "Deny" : "Allow"
  }

  tags = local.tags
}

locals {
  tags = {
    Environment = var.environment
    Purpose     = "ai-foundry"
    ManagedBy   = "terraform"
  }
}

⚠️ Azure gotcha: custom_subdomain_name must be globally unique. That's why we append a random suffix.

🤖 Step 3: Variable-Driven Model Deployments

Deployments read entirely from variables — no hardcoded model names:

# ai-foundry/models.tf

# Primary model — flagship for this environment
resource "azurerm_cognitive_deployment" "primary" {
  name                 = var.primary_model.name
  cognitive_account_id = azurerm_cognitive_account.ai_foundry.id

  sku {
    name     = var.primary_model.sku
    capacity = var.primary_model.tpm
  }

  model {
    format  = "OpenAI"
    name    = var.primary_model.name
    version = var.primary_model.version
  }
}

# Economy model — cheap fallback / dev default
resource "azurerm_cognitive_deployment" "economy" {
  # Only deploy if it's different from primary (avoids duplicate in dev)
  count = var.economy_model.name != var.primary_model.name ? 1 : 0

  name                 = var.economy_model.name
  cognitive_account_id = azurerm_cognitive_account.ai_foundry.id

  sku {
    name     = var.economy_model.sku
    capacity = var.economy_model.tpm
  }

  model {
    format  = "OpenAI"
    name    = var.economy_model.name
    version = var.economy_model.version
  }
}

# Embedding model — for RAG / vector search
resource "azurerm_cognitive_deployment" "embedding" {
  name                 = var.embedding_model.name
  cognitive_account_id = azurerm_cognitive_account.ai_foundry.id

  sku {
    name     = var.embedding_model.sku
    capacity = var.embedding_model.tpm
  }

  model {
    format  = "OpenAI"
    name    = var.embedding_model.name
    version = var.embedding_model.version
  }
}

# ─── Outputs for downstream consumers ───

output "primary_deployment_name" {
  value       = azurerm_cognitive_deployment.primary.name
  description = "Deployment name for the primary model"
}

output "economy_deployment_name" {
  value = (
    var.economy_model.name != var.primary_model.name
    ? azurerm_cognitive_deployment.economy[0].name
    : azurerm_cognitive_deployment.primary.name
  )
  description = "Deployment name for the economy model"
}

output "ai_endpoint" {
  value = azurerm_cognitive_account.ai_foundry.endpoint
}

Notice the count on economy model: In dev, both primary and economy might be gpt-5-nano. The count logic skips deploying a duplicate. ✅

🔐 Step 4: RBAC — Azure AD Authentication

# ai-foundry/rbac.tf

data "azurerm_client_config" "current" {}

resource "azurerm_role_assignment" "function_openai" {
  scope                = azurerm_cognitive_account.ai_foundry.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = azurerm_linux_function_app.ai_endpoint.identity[0].principal_id
}

resource "azurerm_role_assignment" "developer_access" {
  count = var.environment == "dev" ? 1 : 0

  scope                = azurerm_cognitive_account.ai_foundry.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = data.azurerm_client_config.current.object_id
}

Role	Can Do	Use For
Cognitive Services OpenAI User	Call models only	Applications, developers
Cognitive Services OpenAI Contributor	Call + deploy models	CI/CD pipelines
Cognitive Services Contributor	Full management	Platform admins only

⚡ Step 5: Model-Agnostic Function App

The function reads deployment names from env vars — zero hardcoded model names:

# ai-foundry/function.tf

resource "azurerm_storage_account" "function" {
  name                     = "${var.environment}aifunc${random_string.suffix.result}"
  resource_group_name      = azurerm_resource_group.ai.name
  location                 = azurerm_resource_group.ai.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

resource "azurerm_service_plan" "function" {
  name                = "${var.environment}-ai-function-plan"
  resource_group_name = azurerm_resource_group.ai.name
  location            = azurerm_resource_group.ai.location
  os_type             = "Linux"
  sku_name            = "Y1"
}

resource "azurerm_linux_function_app" "ai_endpoint" {
  name                       = "${var.environment}-ai-endpoint-${random_string.suffix.result}"
  resource_group_name        = azurerm_resource_group.ai.name
  location                   = azurerm_resource_group.ai.location
  storage_account_name       = azurerm_storage_account.function.name
  storage_account_access_key = azurerm_storage_account.function.primary_access_key
  service_plan_id            = azurerm_service_plan.function.id

  identity {
    type = "SystemAssigned"
  }

  app_settings = {
    AZURE_OPENAI_ENDPOINT           = azurerm_cognitive_account.ai_foundry.endpoint
    AZURE_OPENAI_PRIMARY_DEPLOYMENT = azurerm_cognitive_deployment.primary.name
    AZURE_OPENAI_API_VERSION        = "2025-04-01-preview"
  }

  site_config {
    application_stack {
      python_version = "3.12"
    }
  }

  tags = local.tags
}

# function_app.py — completely model-agnostic

import azure.functions as func
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
import json, os

app = func.FunctionApp()

credential = DefaultAzureCredential()
token_provider = lambda: credential.get_token(
    "https://cognitiveservices.azure.com/.default"
).token

client = AzureOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_ad_token_provider=token_provider,
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

@app.route(route="chat", methods=["POST"])
def chat(req: func.HttpRequest) -> func.HttpResponse:
    try:
        body = req.get_json()
        # Model comes from env var — NEVER hardcoded
        deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]

        response = client.chat.completions.create(
            model=deployment,
            messages=[{"role": "user", "content": body.get("prompt", "Hello!")}],
            max_tokens=body.get("max_tokens", 500),
            temperature=body.get("temperature", 0.7),
        )

        return func.HttpResponse(json.dumps({
            "response": response.choices[0].message.content,
            "model": response.model,
            "deployment": deployment,
            "usage": {
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens,
            }
        }), mimetype="application/json")
    except Exception as e:
        return func.HttpResponse(
            json.dumps({"error": str(e)}), status_code=500, mimetype="application/json"
        )

Zero model names in application code. The deployment name flows from .tfvars → Terraform → env var → Python. 🔗

🧪 Step 6: Deploy & Test

# Deploy dev (uses gpt-5-nano — $0.05/1M input)
terraform apply -var-file=environments/dev.tfvars

# Test it
curl -X POST "https://$(terraform output -raw function_app_url)/api/chat" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain Kubernetes in 2 sentences."}'

# Deploy prod (uses gpt-5 — $1.25/1M input)
terraform apply -var-file=environments/prod.tfvars

🔄 The Upgrade Workflow

When GPT-5.2 goes generally available:

# environments/prod.tfvars

 primary_model = {
-  name    = "gpt-5"
-  version = "2025-08-07"
+  name    = "gpt-5.2"
+  version = "2025-12-11"
   sku     = "GlobalStandard"
   tpm     = 80
 }

And cascade the old flagship down:

# environments/staging.tfvars

 primary_model = {
-  name    = "gpt-5-mini"
-  version = "2025-08-07"
+  name    = "gpt-5"
+  version = "2025-08-07"
   sku     = "GlobalStandard"
   tpm     = 40
 }

terraform plan -var-file=environments/prod.tfvars   # Review
terraform apply -var-file=environments/prod.tfvars   # Deploy

No application code changes. No Docker rebuilds. No sprint tickets. One .tfvars diff → PR → review → merge → done. 🎯

Model cascade pattern: Latest → Prod, previous flagship → Staging, cheapest → Dev. Each upgrade just shifts models down. ♻️

🎯 What You Just Built

┌──────────────────────────────────────────────────┐
│                  .tfvars files                   │
│  dev: gpt-5-nano ($0.05)  prod: gpt-5 ($1.25)    │
└──────────────┬───────────────────────────────────┘
               │ terraform apply -var-file=...
               ▼
┌──────────────────────────┐
│  Azure AI Foundry        │
│  Model Deployments       │
│  (primary + economy +    │
│   embedding)             │
└──────────────┬───────────┘
               │ env vars (deployment name)
               ▼
┌──────────────────────────┐
│  Azure Function App      │
│  (model-agnostic code)   │
│  Managed Identity auth   │
└──────────────────────────┘

Config flows one direction: .tfvars → infrastructure → application. The app never knows or cares which model it's calling. 🚀

⏭️ What's Next

This is Post 1 of the AI Infra on Azure with Terraform series. Coming up:

Post 2: Azure AI Content Safety — Block harmful content with Terraform
Post 3: Diagnostic Logging — Track every AI call with Azure Monitor
Post 4: RAG with Azure AI Search — Connect your docs to GPT-5

The next model is always around the corner. Build your infra so upgrading is a config change, not a project. 🏢

Found this helpful? Follow for the full AI Infra on Azure with Terraform series! 💬

DEV Community