Azure AI Foundry hosts agents that reason, remember, and use tools. Here's how to deploy the infrastructure with Terraform and create your first agent with the Python SDK, where upgrading to the next GPT or Claude model is a single variable change.
In Series 1, we deployed an Azure AI Foundry endpoint and called GPT-4o directly. Single prompt, single response, stateless.
Azure AI agents are different. An agent wraps a model deployment with an orchestration layer that maintains conversation threads, decides when to call tools, and loops through reasoning steps until it resolves the user's request. The model generates text; the agent decides what to do with it.
Azure's agent stack lives inside AI Foundry: you create an AI Services resource with project management enabled, deploy a model, then use the Azure AI Agent SDK to create agents that use that deployment. Terraform provisions the infrastructure (Foundry resource, project, model deployment, IAM). The Python SDK creates and manages agents. When GPT-5.2 or the next Claude drops, you change one variable. 🎯
🧠 Endpoint vs Agent: What's Different
| Capability | Direct Endpoint (Series 1) | Azure AI Agent |
|---|---|---|
| Interaction | Single request/response | Multi-turn conversation threads |
| State | Stateless | Thread-based message history |
| Tool use | Manual implementation | Function calling, code interpreter, file search |
| Reasoning | Whatever the model does | Run-based orchestration loop |
| Deployment | Model deployment only | Agent + model deployment + thread management |
🔧 Model Configuration: The Future-Proof Layer
# variables.tf
variable "agent_model" {
description = "Model for the agent. Change this to upgrade."
type = object({
name = string # Azure OpenAI model name
version = string # Model version
sku = string # Deployment SKU (Standard, GlobalStandard)
tpm = number # Tokens per minute capacity
})
default = {
name = "gpt-4o"
version = "2024-11-20"
sku = "GlobalStandard"
tpm = 30
}
}
variable "agent_name" {
description = "Name of the AI agent"
type = string
default = "assistant"
}
variable "agent_instruction" {
description = "System instruction for the agent"
type = string
}
Environment-specific overrides:
# environments/dev.tfvars
agent_model = {
name = "gpt-4o-mini"
version = "2024-07-18"
sku = "GlobalStandard"
tpm = 30
}
# environments/prod.tfvars
agent_model = {
name = "gpt-4o"
version = "2024-11-20"
sku = "GlobalStandard"
tpm = 80
}
# When a new model launches:
# agent_model = {
# name = "gpt-5"
# version = "2025-08-07"
# sku = "GlobalStandard"
# tpm = 80
# }
🔧 Terraform: Provision the Infrastructure
AI Foundry Resource and Project
Azure AI Agent Service requires an AI Services resource with project_management_enabled = true. This is the newer "Foundry" resource type that supports agents natively:
# agent/foundry.tf
resource "azurerm_cognitive_account" "ai_foundry" {
name = "${var.environment}-${var.project}-foundry"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
kind = "AIServices"
sku_name = "S0"
custom_subdomain_name = "${var.environment}-${var.project}-foundry"
project_management_enabled = true
identity {
type = "SystemAssigned"
}
tags = {
Environment = var.environment
Model = var.agent_model.name
ManagedBy = "terraform"
}
}
resource "azurerm_cognitive_account_project" "agent_project" {
name = "${var.environment}-agent-project"
cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
}
project_management_enabled = true is the critical setting. Without it, the AI Services resource won't support agent operations. The S0 SKU is required for stateful features including the agent service.
Model Deployment
# agent/deployment.tf
resource "azurerm_cognitive_deployment" "agent_model" {
name = "${var.environment}-${var.agent_model.name}"
cognitive_account_id = azurerm_cognitive_account.ai_foundry.id
sku {
name = var.agent_model.sku
capacity = var.agent_model.tpm
}
model {
format = "OpenAI"
name = var.agent_model.name
version = var.agent_model.version
}
}
When you change agent_model in .tfvars, Terraform updates the deployment. The agent SDK references the deployment name, which updates automatically.
RBAC for Agent Operations
# agent/iam.tf
data "azurerm_client_config" "current" {}
# Your identity needs Cognitive Services User to create agents
resource "azurerm_role_assignment" "agent_user" {
scope = azurerm_cognitive_account.ai_foundry.id
role_definition_name = "Cognitive Services User"
principal_id = data.azurerm_client_config.current.object_id
}
🐍 Create the Agent with Python SDK
Terraform provisions the infrastructure. The Python SDK creates and manages agents:
# agent_code/create_agent.py
from azure.ai.agents import AgentsClient
from azure.identity import DefaultAzureCredential
import json
# Load config from Terraform output
with open("agent_config.json") as f:
config = json.load(f)
client = AgentsClient(
endpoint=config["foundry_endpoint"],
credential=DefaultAzureCredential(),
)
# Create the agent - model is a variable, not hardcoded
agent = client.create_agent(
model=config["deployment_name"],
name=config["agent_name"],
instructions=config["instruction"],
)
print(f"Agent created: {agent.id}")
Terraform Outputs for SDK
Bridge Terraform infrastructure to the Python SDK:
# agent/outputs.tf
resource "local_file" "agent_config" {
filename = "${path.module}/agent_code/agent_config.json"
content = jsonencode({
foundry_endpoint = azurerm_cognitive_account.ai_foundry.endpoint
deployment_name = azurerm_cognitive_deployment.agent_model.name
agent_name = "${var.environment}-${var.agent_name}"
instruction = var.agent_instruction
})
}
🧪 Invoke the Agent
Azure agents use a thread-based conversation model: create a thread, add messages, create a run, and stream the response:
# Create a conversation thread
thread = client.threads.create()
# Add user message
client.messages.create(
thread_id=thread.id,
role="user",
content="What were our Q3 revenue numbers?"
)
# Run the agent on the thread
run = client.runs.create_and_process(
thread_id=thread.id,
agent_id=agent.id,
)
# Get the response
if run.status == "completed":
messages = client.messages.list(thread_id=thread.id)
for msg in messages:
if msg.role == "assistant":
print(msg.content[0].text.value)
Thread = conversation session. Each thread maintains full message history. The agent sees all previous messages when generating a response. Threads persist until you delete them.
📐 The Upgrade Workflow
When a new model launches:
-
Update
.tfvars- changeagent_model.name,agent_model.version -
Run
terraform apply- deployment updated, config regenerated -
Recreate agent -
python create_agent.pyuses the new deployment
Existing threads continue working. The agent just uses the new model for future runs. No application code changes needed because the deployment name is derived from the variable.
⚠️ Gotchas and Tips
Use azapi for gaps. The azurerm provider covers AI Foundry resources well, but some features (workspace connections, certain preview APIs) need the azapi provider as an escape hatch.
Model availability varies by region. GPT-4o is widely available, but newer models may be limited to specific regions. Check Azure OpenAI model availability before setting your region.
Thread cleanup. Threads don't auto-expire. In production, implement a cleanup policy to delete old threads and avoid storage buildup.
Agent recreation. Unlike AWS where you update the agent in place, Azure agents are typically recreated with a new create_agent call when instructions or the model changes. Store the agent ID in your application config.
⏭️ What's Next
This is Post 1 of the Azure AI Agents with Terraform series.
- Post 1: Deploy First Azure AI Agent (you are here) 🤖
- Post 2: Function Calling - Connect to APIs
- Post 3: Multi-Agent with AI Agent Service
- Post 4: Agent + Bing Grounding
Your first Azure agent is deployed. Terraform provisions the Foundry, the SDK creates the agent, and when the next model launches, upgrading is a variable change. The model is configuration, not commitment. 🤖
Found this helpful? Follow for the full AI Agents with Terraform series! 💬
Top comments (0)