Microsoft Foundry Just Added CI/CD for AI Agents. Here's What That Actually Changes.

#ai #devops #azure #agentskills

Most teams can build an AI agent in a weekend. Getting it to production — with version control, quality gates, multi-environment promotion, and audit trails — is where everything breaks down. Microsoft just shipped a reference architecture that treats that problem seriously.

The Problem It's Solving

AI agents have been stuck in a productionization gap. You can prototype fast. Shipping responsibly is another matter entirely. The gap isn't model quality — it's infrastructure. Who owns the deployment pipeline? How do you gate a release on evaluation scores, not just unit tests? How do you promote an agent from dev to test to prod without manual intervention and prayer?

Standard software teams have solved this with CI/CD rigour. The friction is applying that same rigour to AI agents, where the "code" is a combination of prompts, tool schemas, model versions, and evaluation thresholds. That combination doesn't fit neatly into a GitHub Actions workflow designed for stateless services.

Microsoft Foundry is Microsoft's answer to that gap. It's a fully managed platform for building, deploying, and governing AI agents at scale, with a first-class agent runtime and built-in lifecycle management — applicable whether you're building containerised hosted agents or declarative prompt-based agents.

How It Actually Works

The architecture has two deployment targets and one shared pipeline model. Hosted Agents use an agent.yaml declarative manifest — aligned with the AgentSchema spec — that defines an agent's portable configuration: name, description, target model, system instructions, tool declarations, and runtime settings like environment variables and protocol choices. This lets you version the agent definition as infrastructure-as-config stored directly in your repo.

The reference pipeline handles promotion across three environments: Dev, Test, and Production. It uses parallel implementations in both GitHub Actions and Azure DevOps, with credentials referenced through secret stores and variable groups — no hardcoded secrets in tracked pipeline files.

The quality gate is the key structural difference from standard software CI/CD. Agents don't fail linting — they fail evaluations. Azure AI Foundry provides offline evaluation tooling within CI/CD pipelines, so agents are assessed against quality standards before any release reaches production. That evaluation step is what makes the pipeline an actual gate rather than a deployment script with extra steps.

On the observability side, Foundry Control Plane now offers full GA on core capabilities including end-to-end tracing built on OpenTelemetry, built-in evaluators covering coherence, relevance, groundedness, and safety, and continuous production traffic monitoring through Azure Monitor. Custom evaluators — both code-based and LLM-as-a-judge — are available in preview for teams with domain-specific quality requirements.

The hosted agent runtime itself has been rebuilt around isolation. Each agent session runs in its own dedicated secure sandbox — no shared state between sessions, no cross-tenant data leakage, sub-100ms startup time with zero idle cost since agents are suspended between conversation turns.

What Teams Are Actually Using It For

The most direct use case is enterprise agent deployment with governance requirements. Foundry Agent Service is a flexible, pro-code solution with extensive developer tooling and CI/CD integration designed for complex enterprise scenarios — including multi-agent orchestration, advanced security, compliance features, flexible model support, and connectivity options suited to large-scale, regulated environments. That's the positioning Microsoft is going after: teams where "it works on my machine" is not a ship criterion.

The AI Red Teaming Agent is now generally available alongside the CI/CD stack, giving teams automated adversarial testing capabilities with CI/CD integration so red teaming runs can be gated into the deployment pipeline itself. Findings are logged and tracked over time in Foundry, so risk posture improves alongside the agent as it evolves.

For teams already using Microsoft Agent Framework, the v1.0 release is now stable across Python and .NET, unifying the enterprise-grade foundations of Semantic Kernel with the multi-agent orchestration from AutoGen. It ships with native MCP, A2A, and OpenAPI support out of the box.

Why This Is a Bigger Deal Than It Looks

The framing here matters. Microsoft isn't shipping a deployment tool — it's shipping an opinion about how agentic software should be developed. The opinion is that agents should be managed exactly like application software: versioned, evaluated, promoted through environments, and governed at the tenant level.

Every agent created in Foundry Agent Service is automatically visible in Microsoft Agent 365, giving IT admins a single unified control plane to observe, secure, and govern all agents across the organization, regardless of where they were built. That's not a developer feature. That's an enterprise procurement argument.

The second implication is framework-level. The Toolbox in Foundry — which exposes web search, file search, code interpreter, and Azure AI Search through a single unified endpoint — works regardless of which agent framework you're using: Microsoft Agent Framework, LangGraph, or others, without custom glue code. That interoperability is deliberate. Microsoft is betting on Foundry as the deployment and governance layer even if teams pick their own orchestration stack.

Availability and Access

The reference architecture includes the GitHub Actions workflow, the Azure DevOps pipeline YAML, and the architecture diagram. The foundry-cicd repository on GitHub has the full implementation. Foundry Toolkit for VS Code is generally available. Hosted agents, memory, and Toolbox are in public preview. Memory billing begins June 1, 2026, with hosted agent compute priced at $0.0994 per vCPU-hour and memory at $0.0118 per GiB-hour during preview — you pay only for active execution.

The bet Microsoft is making is that the hard part of agentic AI isn't building agents — it's shipping them with the same operational rigor that existing software demands. Whether that framing lands depends on whether enterprise teams are actually blocked on deployment infrastructure, or on something harder to automate.

Follow for more coverage on MCP, agentic AI, and AI infrastructure.

Top comments (2)

James O'Connor • May 25

CI/CD for agents is the right framing. The part that took us longest to get right was the eval gate itself, not the promotion mechanics. We treat the agent like any service: an input contract, an output contract, and a regression suite that runs on every PR. Promptfoo runs 320 frozen prompts on every change. Anything that flips a tool-call decision compared to the main branch blocks the merge. Tool-call drift turned out to be 4x more common than output-text drift, and CI/CD pipelines that only check final output never catch it. The Foundry shape looks fine, the question is whether the eval gate has tool-call lineage as a first-class artifact or only outputs.

Om Shree • May 25

Thanks Sir
Loved your Insights!!!