Deploy GPT-5 on Azure AI Foundry with Terraform: Swap Models in One Line, Not One Sprint 🧠
Azure gives you GPT-5, GPT-5-mini, GPT-5-nano, and the full OpenAI catalog on your infrastructure. Here's how to deploy it with Terraform so upgrading to the next model is a one-variable change.
OpenAI drops new models every few months. GPT-4o → GPT-4.1 → GPT-5 — each one smarter and cheaper per token.
Here's the problem: most teams hardcode model names everywhere. When a new model drops, it's a sprint to update infrastructure, deployments, environment variables, and application code across every environment.
What if upgrading to GPT-5.2 (or whatever comes next) was a one-variable change?
That's what we're building today — an Azure AI Foundry setup with Terraform where models are variables, not hardcoded strings. When the next model drops, you change one .tfvars file and run terraform apply. Done. 🎯
🤔 Azure AI Foundry vs Bedrock vs Vertex AI
| Azure AI Foundry | Bedrock (AWS) | Vertex AI (GCP) | |
|---|---|---|---|
| What it is | Managed OpenAI + multi-model platform | API access to multiple model providers | Full AI platform (inference + training) |
| Models | GPT-5, GPT-5-mini/nano, o4-mini | Claude, Llama, Titan | Gemini 3, 2.5, Llama, Claude |
| Unique strength | OpenAI API compatibility + Azure AD | Serverless Knowledge Bases & Agents | Native BigQuery integration |
| Key advantage | Drop-in replacement for OpenAI API | Broadest model selection | Most complete ML platform |
Azure's edge: If your team already uses the OpenAI Python SDK, switching to Azure is a 3-line code change. Same API, just pointed at your Azure endpoint. 🎯
💰 Azure AI Model Landscape (As of February 2026)
Before writing any Terraform, know what you're choosing from:
| Model | Best For | Input $/1M tokens | Output $/1M tokens | Context Window | Speed |
|---|---|---|---|---|---|
| GPT-5 | Flagship: reasoning, coding, agentic | $1.25 | $10.00 | 400K | Fast |
| GPT-5 Mini | 80% of GPT-5 at 20% cost | $0.25 | $2.00 | 400K | Faster |
| GPT-5 Nano | Ultra-cheap, classification, extraction | $0.05 | $0.40 | 400K | Fastest |
| GPT-5.2 | Latest frontier (preview/limited) | $1.75 | $14.00 | 400K | Moderate |
| GPT-4.1 | Previous gen (still excellent) | $2.00 | $8.00 | 1M | Fast |
| GPT-4.1 Mini | Previous gen balanced | $0.40 | $1.60 | 1M | Faster |
| o4-mini | Deep reasoning, math | $1.10 | $4.40 | 200K | Moderate |
| text-embedding-3-small | Embeddings for RAG | $0.02 | — | 8K | Fastest |
Key insight: GPT-5-nano ($0.05/1M) is 25x cheaper than GPT-5 ($1.25/1M). Use nano for dev, GPT-5 for prod. When GPT-5.2 goes GA, just update one variable. 💸
💡 Plot twist: GPT-5 ($1.25/1M input) is actually cheaper than its predecessor GPT-4.1 ($2.00/1M input) while being significantly more capable. Upgrading both saves money and gets better results.
🏗️ Step 1: Model Configuration as Variables
This is the key to the entire setup — every model detail is a variable:
# ai-foundry/variables.tf
variable "location" {
type = string
default = "eastus2"
}
variable "environment" {
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be: dev, staging, or prod."
}
}
# ─── MODEL CONFIGURATION (Change ONLY these to upgrade) ───
variable "primary_model" {
description = "Primary model for chat/completion. Change this when a new model releases."
type = object({
name = string
version = string
sku = string
tpm = number # Tokens per minute (in thousands)
})
}
variable "economy_model" {
description = "Cheap model for dev/testing/simple tasks. Usually the 'nano' or 'mini' variant."
type = object({
name = string
version = string
sku = string
tpm = number
})
}
variable "embedding_model" {
description = "Embedding model for RAG/vector search."
type = object({
name = string
version = string
sku = string
tpm = number
})
}
Now create per-environment .tfvars files — this is where the magic happens:
# environments/dev.tfvars
# ─── Dev: Cheapest models, low throughput ───
environment = "dev"
primary_model = {
name = "gpt-5-nano" # $0.05/1M input — dirt cheap
version = "2025-08-07"
sku = "GlobalStandard"
tpm = 10
}
economy_model = {
name = "gpt-5-nano" # Same as primary in dev
version = "2025-08-07"
sku = "GlobalStandard"
tpm = 10
}
embedding_model = {
name = "text-embedding-3-small"
version = "1"
sku = "Standard"
tpm = 50
}
# environments/prod.tfvars
# ─── Prod: Latest flagship, high throughput ───
environment = "prod"
primary_model = {
name = "gpt-5" # $1.25/1M input — latest flagship
version = "2025-08-07"
sku = "GlobalStandard"
tpm = 80
}
economy_model = {
name = "gpt-5-mini" # $0.25/1M — fallback for simple tasks
version = "2025-08-07"
sku = "GlobalStandard"
tpm = 120
}
embedding_model = {
name = "text-embedding-3-small"
version = "1"
sku = "Standard"
tpm = 350
}
🚀 When GPT-5.2 goes GA: Update
prod.tfvarswithname = "gpt-5.2"and the new version. Runterraform apply. That's it. No code changes. No sprint planning. One PR, one review, one deploy.
🏗️ Step 2: AI Foundry Resource
# ai-foundry/main.tf
terraform {
required_version = ">= 1.5.0"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 4.0.0"
}
}
}
provider "azurerm" {
features {
cognitive_account {
purge_soft_delete_on_destroy = true
}
}
}
resource "random_string" "suffix" {
length = 6
special = false
upper = false
}
# Resource Group
resource "azurerm_resource_group" "ai" {
name = "rg-${var.environment}-ai-foundry"
location = var.location
tags = local.tags
}
# AI Foundry (Cognitive Services Account)
resource "azurerm_cognitive_account" "ai_foundry" {
name = "${var.environment}-ai-foundry"
location = azurerm_resource_group.ai.location
resource_group_name = azurerm_resource_group.ai.name
kind = "OpenAI"
sku_name = "S0"
custom_subdomain_name = "${var.environment}-ai-foundry-${random_string.suffix.result}"
local_auth_enabled = var.environment == "prod" ? false : true
identity {
type = "SystemAssigned"
}
network_acls {
default_action = var.environment == "prod" ? "Deny" : "Allow"
}
tags = local.tags
}
locals {
tags = {
Environment = var.environment
Purpose = "ai-foundry"
ManagedBy = "terraform"
}
}
⚠️ Azure gotcha:
custom_subdomain_namemust be globally unique. That's why we append a random suffix.
🤖 Step 3: Variable-Driven Model Deployments
Deployments read entirely from variables — no hardcoded model names:
# ai-foundry/models.tf
# Primary model — flagship for this environment
resource "azurerm_cognitive_deployment" "primary" {
name = var.primary_model.name
cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
sku {
name = var.primary_model.sku
capacity = var.primary_model.tpm
}
model {
format = "OpenAI"
name = var.primary_model.name
version = var.primary_model.version
}
}
# Economy model — cheap fallback / dev default
resource "azurerm_cognitive_deployment" "economy" {
# Only deploy if it's different from primary (avoids duplicate in dev)
count = var.economy_model.name != var.primary_model.name ? 1 : 0
name = var.economy_model.name
cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
sku {
name = var.economy_model.sku
capacity = var.economy_model.tpm
}
model {
format = "OpenAI"
name = var.economy_model.name
version = var.economy_model.version
}
}
# Embedding model — for RAG / vector search
resource "azurerm_cognitive_deployment" "embedding" {
name = var.embedding_model.name
cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
sku {
name = var.embedding_model.sku
capacity = var.embedding_model.tpm
}
model {
format = "OpenAI"
name = var.embedding_model.name
version = var.embedding_model.version
}
}
# ─── Outputs for downstream consumers ───
output "primary_deployment_name" {
value = azurerm_cognitive_deployment.primary.name
description = "Deployment name for the primary model"
}
output "economy_deployment_name" {
value = (
var.economy_model.name != var.primary_model.name
? azurerm_cognitive_deployment.economy[0].name
: azurerm_cognitive_deployment.primary.name
)
description = "Deployment name for the economy model"
}
output "ai_endpoint" {
value = azurerm_cognitive_account.ai_foundry.endpoint
}
Notice the count on economy model: In dev, both primary and economy might be gpt-5-nano. The count logic skips deploying a duplicate. ✅
🔐 Step 4: RBAC — Azure AD Authentication
# ai-foundry/rbac.tf
data "azurerm_client_config" "current" {}
resource "azurerm_role_assignment" "function_openai" {
scope = azurerm_cognitive_account.ai_foundry.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_linux_function_app.ai_endpoint.identity[0].principal_id
}
resource "azurerm_role_assignment" "developer_access" {
count = var.environment == "dev" ? 1 : 0
scope = azurerm_cognitive_account.ai_foundry.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = data.azurerm_client_config.current.object_id
}
| Role | Can Do | Use For |
|---|---|---|
| Cognitive Services OpenAI User | Call models only | Applications, developers |
| Cognitive Services OpenAI Contributor | Call + deploy models | CI/CD pipelines |
| Cognitive Services Contributor | Full management | Platform admins only |
⚡ Step 5: Model-Agnostic Function App
The function reads deployment names from env vars — zero hardcoded model names:
# ai-foundry/function.tf
resource "azurerm_storage_account" "function" {
name = "${var.environment}aifunc${random_string.suffix.result}"
resource_group_name = azurerm_resource_group.ai.name
location = azurerm_resource_group.ai.location
account_tier = "Standard"
account_replication_type = "LRS"
}
resource "azurerm_service_plan" "function" {
name = "${var.environment}-ai-function-plan"
resource_group_name = azurerm_resource_group.ai.name
location = azurerm_resource_group.ai.location
os_type = "Linux"
sku_name = "Y1"
}
resource "azurerm_linux_function_app" "ai_endpoint" {
name = "${var.environment}-ai-endpoint-${random_string.suffix.result}"
resource_group_name = azurerm_resource_group.ai.name
location = azurerm_resource_group.ai.location
storage_account_name = azurerm_storage_account.function.name
storage_account_access_key = azurerm_storage_account.function.primary_access_key
service_plan_id = azurerm_service_plan.function.id
identity {
type = "SystemAssigned"
}
app_settings = {
AZURE_OPENAI_ENDPOINT = azurerm_cognitive_account.ai_foundry.endpoint
AZURE_OPENAI_PRIMARY_DEPLOYMENT = azurerm_cognitive_deployment.primary.name
AZURE_OPENAI_API_VERSION = "2025-04-01-preview"
}
site_config {
application_stack {
python_version = "3.12"
}
}
tags = local.tags
}
# function_app.py — completely model-agnostic
import azure.functions as func
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
import json, os
app = func.FunctionApp()
credential = DefaultAzureCredential()
token_provider = lambda: credential.get_token(
"https://cognitiveservices.azure.com/.default"
).token
client = AzureOpenAI(
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_ad_token_provider=token_provider,
api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)
@app.route(route="chat", methods=["POST"])
def chat(req: func.HttpRequest) -> func.HttpResponse:
try:
body = req.get_json()
# Model comes from env var — NEVER hardcoded
deployment = os.environ["AZURE_OPENAI_PRIMARY_DEPLOYMENT"]
response = client.chat.completions.create(
model=deployment,
messages=[{"role": "user", "content": body.get("prompt", "Hello!")}],
max_tokens=body.get("max_tokens", 500),
temperature=body.get("temperature", 0.7),
)
return func.HttpResponse(json.dumps({
"response": response.choices[0].message.content,
"model": response.model,
"deployment": deployment,
"usage": {
"prompt_tokens": response.usage.prompt_tokens,
"completion_tokens": response.usage.completion_tokens,
"total_tokens": response.usage.total_tokens,
}
}), mimetype="application/json")
except Exception as e:
return func.HttpResponse(
json.dumps({"error": str(e)}), status_code=500, mimetype="application/json"
)
Zero model names in application code. The deployment name flows from .tfvars → Terraform → env var → Python. 🔗
🧪 Step 6: Deploy & Test
# Deploy dev (uses gpt-5-nano — $0.05/1M input)
terraform apply -var-file=environments/dev.tfvars
# Test it
curl -X POST "https://$(terraform output -raw function_app_url)/api/chat" \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain Kubernetes in 2 sentences."}'
# Deploy prod (uses gpt-5 — $1.25/1M input)
terraform apply -var-file=environments/prod.tfvars
🔄 The Upgrade Workflow
When GPT-5.2 goes generally available:
# environments/prod.tfvars
primary_model = {
- name = "gpt-5"
- version = "2025-08-07"
+ name = "gpt-5.2"
+ version = "2025-12-11"
sku = "GlobalStandard"
tpm = 80
}
And cascade the old flagship down:
# environments/staging.tfvars
primary_model = {
- name = "gpt-5-mini"
- version = "2025-08-07"
+ name = "gpt-5"
+ version = "2025-08-07"
sku = "GlobalStandard"
tpm = 40
}
terraform plan -var-file=environments/prod.tfvars # Review
terraform apply -var-file=environments/prod.tfvars # Deploy
No application code changes. No Docker rebuilds. No sprint tickets. One .tfvars diff → PR → review → merge → done. 🎯
Model cascade pattern: Latest → Prod, previous flagship → Staging, cheapest → Dev. Each upgrade just shifts models down. ♻️
🎯 What You Just Built
┌──────────────────────────────────────────────────┐
│ .tfvars files │
│ dev: gpt-5-nano ($0.05) prod: gpt-5 ($1.25) │
└──────────────┬───────────────────────────────────┘
│ terraform apply -var-file=...
▼
┌──────────────────────────┐
│ Azure AI Foundry │
│ Model Deployments │
│ (primary + economy + │
│ embedding) │
└──────────────┬───────────┘
│ env vars (deployment name)
▼
┌──────────────────────────┐
│ Azure Function App │
│ (model-agnostic code) │
│ Managed Identity auth │
└──────────────────────────┘
Config flows one direction: .tfvars → infrastructure → application. The app never knows or cares which model it's calling. 🚀
⏭️ What's Next
This is Post 1 of the AI Infra on Azure with Terraform series. Coming up:
- Post 2: Azure AI Content Safety — Block harmful content with Terraform
- Post 3: Diagnostic Logging — Track every AI call with Azure Monitor
- Post 4: RAG with Azure AI Search — Connect your docs to GPT-5
The next model is always around the corner. Build your infra so upgrading is a config change, not a project. 🏢
Found this helpful? Follow for the full AI Infra on Azure with Terraform series! 💬
Top comments (0)