DEV Community: Alex Pechenizkiy

Microsoft Fabric Apps Are a Distribution Channel, Not a Marketplace: Who Should Build One

Alex Pechenizkiy — Tue, 21 Jul 2026 15:29:15 +0000

Most ISV product teams I talk to have filed Microsoft Fabric apps under "marketplace strategy," somewhere between the AppSource listing and the Teams app nobody maintains. That is the wrong mental model, and it will cost some of them their category.

A workload built with the Fabric Workload Dev Kit is not an add-in bolted to the side of Fabric. It is a native surface that renders beside Lakehouse, Power BI, and Data Factory, inside the tool where the buyer's data team already spends its day. That makes this a distribution decision, and distribution decisions with a lock-in risk attached deserve a real evaluation, not a wait-and-see posture. Here is the framework to make that call in 90 days.

What a Fabric app actually is (and is not)

The Fabric Workload Development Kit, which went generally available at Ignite 2024, lets you register your own item types in a Fabric tenant. Your workload gets a real presence: items that live in workspaces, a UI that renders inside the Fabric shell, read and write access to OneLake, and authentication that flows through Microsoft Entra ID using on-behalf-of tokens, so your app acts as the signed-in user, not a service account with god rights.

Be precise about what is automatic and what is not. Discovery through the Workload Hub and rendering in the Fabric shell come with the platform. Entra identity flow, item lifecycle handling, and governance behavior for your custom items are things you implement against the dev kit's contracts. And your backend runs on your own infrastructure. Fabric brokers requests to you; it does not host your compute.

This is a different animal from everything ISVs have shipped into the Microsoft ecosystem before. A Power BI custom visual is a rendering component. A Teams app is a collaboration surface. An Azure Marketplace SaaS listing is a commerce contract with zero data-plane coupling. A Fabric workload sits in the data plane itself, next to the customer's tables.

One clarification the "app store" framing blurs: the Workload Hub handles in-product discovery and deployment. Procurement and monetization still run through your own commercial motion or an Azure Marketplace transaction. Fabric solves the "get in front of the buyer inside their tool" problem, not the "collect the check" problem.

The docs undersell the engineering

The Workload Dev Kit documentation makes registration look like a configuration exercise. In practice, the frontend/backend contract, item CRUD lifecycle, and job scheduling integration take real engineering time, and the sample workload only covers the happy path. Budget for a proper build, not a hackathon. This is judgment from reading the contracts closely, not a measured estimate; your team's number will vary.

The takeaway: this is platform-native software. Evaluate it like a platform bet, with platform-bet diligence, not like another listing to keep fresh.

The distribution math behind Microsoft Fabric apps

Enterprise SaaS deals die in two places: procurement and security review. Every data vendor selling into the enterprise knows the ritual. Vendor security questionnaires, data residency reviews, a new DPA, a network architecture diagram for where the customer's data will live.

Now walk through what a Fabric-native workload changes structurally. The customer's data does not leave their tenant; your workload operates on OneLake data under their existing workspace roles and sensitivity labels. Identity is their Entra ID. The platform relationship already exists under their Microsoft agreement. I will not attach a percentage to how much friction that removes, because nobody has published one and I refuse to invent it. But the logic is hard to argue with: every control the customer already trusts is one review you do not restart from zero. Some reviews remain, because your backend is still your backend, but the surface area shrinks.

Then there is the crowding question. Microsoft said at Build 2024 that more than 11,000 organizations use Fabric, and the number has grown since. Meanwhile the Workload Hub is thin. Microsoft's own Ignite 2024 announcement named a handful of launch partners, including Esri, SAS, Informatica, and Teradata. That is a short list for a platform with that customer base. Compare it to AppSource on day one versus AppSource today.

I will take the strong position: for data-adjacent ISVs, this is the cheapest enterprise distribution channel Microsoft has offered since AppSource launched. Early entrants get category positioning while the hub has a dozen tiles instead of a thousand. That window closes the way it always does, quietly and then all at once.

The OneLake trade: interoperability story, capacity-model reality

Here is where the cheerleading stops, because the trade has real teeth.

The good half first. Building Fabric-native means your data model lives in Delta Lake tables on OneLake, open Parquet under the hood. That is a genuinely strong OneLake ISV integration story: the customer keeps one copy of their data, in an open format, readable by every engine in their estate and by yours. No proprietary storage tier, no export pipeline, no "your data is in our cloud now" conversation. For customers burned by data silos, this is the single best sales argument a Fabric workload has. I have written before about how OneLake shortcuts change data architecture, and the same zero-copy logic works in the ISV's favor.

Now the cage. Your cost of goods becomes entangled with Microsoft's capacity unit model. Fabric compute is billed in CUs across F SKUs; published list pricing puts an F64 at roughly $11.52 per hour pay-as-you-go, region dependent, per the Azure Fabric pricing page, with reservations meaningfully cheaper. Those are Microsoft's prices to change, and Microsoft's smoothing and throttling behavior to define. If your margin model assumed compute costs you control on infrastructure you tune, you just gave away a lever.

The second risk is sharper: Microsoft ships first-party workloads aggressively. If your feature is adjacent to something on the Fabric roadmap, you can be commoditized in a single release wave. Ingestion, transformation, and basic BI are already spoken for. Your IP survives only if it lives above the storage substrate: domain logic, vertical data models, proprietary ML, regulated workflows Microsoft will not build.

Prototype CU behavior before you price anything

CU consumption under real customer load is hard to predict from documentation alone. Background jobs, smoothing windows, and interactive spikes behave differently than the pricing table implies. Run a representative workload against a trial capacity and measure before you commit to a pricing model. Any modeling you do before that is illustrative; calibrate against actuals.

The takeaway: name the cage risk in your board deck, in the same slide as the distribution upside. Not after you have signed customers.

Who should build Microsoft Fabric custom workloads

Fit is not about whether your team can technically build against the dev kit. Almost any competent platform team can. Fit is about where your product's gravity lives: if your value compounds by sitting next to the customer's OneLake data, the channel works for you; if it does not, you are renting expensive real estate in someone else's mall.

Signal	Strong fit: build	Weak fit: skip
Product gravity	Value increases next to customer data: observability, data quality, MDM, industry analytics	Value is independent of where the data sits
Buyer base	Data teams already living in the Microsoft stack	Multi-cloud-neutral buyers, or a promise of cloud neutrality you cannot break
Compute economics	Analytics-shaped workloads that map to CU consumption	GPU-heavy, custom silicon, or margin models that need infrastructure control
Latency profile	Batch and interactive analytics	Sub-second transactional paths where Delta and Parquet fight you
IP position	Domain logic and models above the storage layer	Features adjacent to ingestion, transformation, or basic BI
Roadmap exposure	Vertical or regulated capability Microsoft is unlikely to build	Horizontal feature one Fabric release away from first-party

Strong-fit categories are the ones the launch partner list already hints at: observability over the data estate, data quality and master data management, geospatial and industry analytics. Anything where the pitch is "we do more because we can see your actual tables, governed, in place."

Weak fit is just as clear. Transactional cores, sub-second serving, GPU-heavy inference where Fabric's compute model does not match your economics, and any product whose buyers are not Microsoft-centric.

Here is my opinionated heuristic, explicitly a rule of thumb and not a measured threshold: if fewer than roughly a third of your sales pipeline runs on the Microsoft data stack, skip Fabric-native for now and revisit in a year. Below that line, the workload becomes a side bet your roadmap subsidizes, and side bets on platform channels rot fast.

Where your product's gravity lives decides this. Not what your engineers can build.

The AI angle Microsoft is underselling

Everything above treats the buyer as a human opening a workload in a browser. That framing is already aging.

Because Fabric workloads operate on governed Delta tables in OneLake, they are addressable by the AI experiences Microsoft is building over the tenant's estate. Copilot in Fabric and Fabric data agents reason over exactly the data layer a native workload plugs into, no custom connectors, no separate semantic export. Verify the preview versus GA status of specific agent features before you build against them; they move quarterly.

Here is the forward-looking argument, and I label it as such because nobody can cite the future: the long-term prize is being the workload an agent calls, not the dashboard a human opens. App-store dynamics repeat with a new user. When an agent orchestrating over a tenant's data estate needs a data quality score, an entity resolution, or a vertical risk model, it will call whatever capability is registered, governed, and adjacent to the data. Workloads outside the estate do not get the call.

Do not build against specific agent APIs today; treat them as unstable. Build the durable position instead: your data in open Delta format, your operations exposed as governed items, your value legible to whatever orchestrates over the estate next.

Position your workload as agent-ready data plus actions, and you are early to the next distribution shift instead of reacting to it.

The 90-day evaluation framework

One engineer-quarter buys you an evidence-based decision. Being late to a platform channel costs a market position. That is the whole cost-benefit.

Define the kill criteria before day one, in writing: the bet dies if CU economics break your margin model, or if your buyers are demonstrably not in Fabric. Everything else is noise.

Days 1 to 30: prototype against the dev kit Build a thin vertical slice with the Fabric Workload Dev Kit: one item type, real Entra on-behalf-of auth, real OneLake reads and writes against a sample workload. The goal is not a demo. The goal is to surface the frontend/backend contract work and lifecycle handling the docs gloss over, so your effort estimate is grounded in your codebase, not Microsoft's sample.
Days 31 to 60: model CU economics against your current spend Run representative load on a trial capacity and measure CU consumption, then model customer-scale costs using published Fabric list pricing. Treat every number as illustrative until you have actuals; industry-standard inputs are a starting point, and your workload shape will move the answer. Compare the result against your current cloud COGS and your pricing floor.
Days 61 to 90: pressure-test with three design partners Take the prototype to three customers or prospects who run Fabric today. Ask two questions: would they deploy your workload inside their tenant instead of your SaaS, and does inherited governance actually shorten their review process for a vendor like you. Their answers tell you whether Fabric-native is a channel or a cage for your product specifically.

For the cost-modeling leg, I have covered modeling Fabric capacity costs in more depth, including where smoothing and background jobs distort naive per-hour math.

A 90-day timebox is cheap insurance. Skipping the timebox is the expensive option, whichever way the decision goes.

Decide on Microsoft Fabric apps this quarter

Build if your product's gravity is in the customer's data estate, where governed adjacency to OneLake makes your product better and the Workload Hub is still thin enough to own a category tile. Skip loudly if it is not, and put the revisit date in the calendar rather than letting the default decide for you.

Microsoft Fabric apps are a channel or a cage, and for your product they are exactly one of the two. You can find out which one in 90 days. The vendors who treat that as optional will find out from their competitors' launch posts instead.

This article was originally published at az365.ai. I'm Alex Pechenizkiy, an Azure and Power Platform solutions architect writing honest, vendor-neutral analysis of the Microsoft AI stack. More at az365.ai.

Foundry Hosted vs In-Process vs Copilot Studio Agents (2026 Decision)

Alex Pechenizkiy — Mon, 20 Jul 2026 15:46:15 +0000

A team lead asks the question in a planning meeting and the room splits three ways: do we build this agent in Copilot Studio, write the orchestration ourselves and host it, or hand our container to Foundry and let it run our code? All three are official Microsoft build paths in 2026, all three end up in the same tenant-wide agent inventory, and the wrong pick costs you a rebuild once the project outgrows it.

The answer is not "the most powerful one." It is the one whose service model matches who is building the agent, who owns the runtime, and how much pro-code control over orchestration and protocols you actually need. This article is the decision framework for that choice, grounded in Microsoft Learn and current as of mid-2026.

Two of these three paths are public preview, so this is a guide to architectural fit and direction, not a production-reliability scorecard.

TL;DR

Three build paths, picked by service model, not power.

Copilot Studio: low-code managed SaaS for makers. GA.

Foundry Hosted agents: managed PaaS runtime for your own container. Public preview.

Microsoft 365 Agents SDK: pro-code, self-hosted, widest channel reach. Agent Framework orchestrator in public preview.

Monday move: before picking a platform, write down four things for this agent - who builds it (maker or pro-dev), who must own the compute, what channels it has to reach, and whether you need custom protocols or background/async behavior. Those four answers pick the path more reliably than a feature checklist.

The three paths in one paragraph each

Microsoft's own Cloud Adoption Framework frames the build options as three service tiers, which is the cleanest mental model to start from. The CAF positions them as Copilot Studio (SaaS, no/low-code), Microsoft Foundry (PaaS, pro-code or low-code), and GPUs and Containers (IaaS, code-first frameworks for maximum flexibility). The first two are managed by Microsoft. The third is where the self-hosted SDK path lives when you own the compute end to end.

Copilot Studio is a graphical, low-code tool for building agents and agent flows, part of the Power Platform set. Microsoft describes it as a fully managed SaaS platform where makers focus on agent experiences without worrying about infrastructure, hosting, or governance. It connects to data via prebuilt or custom connectors, can run standalone, and can extend Microsoft 365 Copilot with enterprise data.

Foundry Hosted agents are the middle path. Foundry Agent Service is a managed platform for building, deploying, and scaling agents, and it offers two agent types: Prompt agents (fully managed, no code or compute to maintain) and Hosted agents (your own containerized code, run by Foundry with a managed endpoint, scaling, identity, and observability). Hosted agents are the option that interests pro-dev teams who want to keep their orchestration code but not run the servers.

The Microsoft 365 Agents SDK is the self-hosted, pro-code path. It lets developers build custom engine agents using the AI stack of their choice and deploy them to Microsoft 365 Copilot. It is model- and orchestrator-agnostic and can surface agents across Microsoft 365 Copilot, Teams, third-party platforms, custom applications, and websites. The trade is that you host the orchestration process yourself.

A note before the comparison, because the brief flagged it as an open question: Foundry also offers Prompt agents, the fully-managed no-code-runtime option inside the same Agent Service. This article treats the three build paths the title names. Prompt agents are a separate managed option within Foundry Agent Service, and if you do not need custom code at all, they sit alongside Copilot Studio as a low-effort choice rather than competing with the Hosted path.

What "in-process" actually means here

"In-process" is the part of this comparison that trips people up, so it is worth pinning down. In-process here means the Microsoft 365 Agents SDK self-hosted path. It refers to running the agent's orchestration inside a process you own and host, rather than handing a container to Foundry or building declaratively in Copilot Studio. Two pieces matter.

The first is the host. With the Microsoft 365 Agents SDK, your Program.cs manages hosting (for example an ASP.NET host in C#), registers storage (the samples use Memory Storage, switchable to Blob or Cosmos DB for production), and routes the api/messages endpoint. You provision and pay for that compute, typically on Azure, separately from the agent itself.

The second is the orchestrator that runs inside that host. Both Semantic Kernel and the Microsoft Agent Framework can be the orchestrator inside an Agents SDK agent. Semantic Kernel implements orchestration patterns directly in the SDK (Concurrent, Sequential, Handoff, Group Chat, Magentic) and ships an in-process runtime via the Microsoft.SemanticKernel.Agents.Runtime.InProcess package. That literal "InProcess" runtime is where the term comes from.

The useful detail for planning: the Microsoft Agent Framework is the direct successor to Semantic Kernel and AutoGen, built by the same teams. It combines AutoGen's agent abstractions with Semantic Kernel's enterprise features and adds graph-based workflows for explicit multi-agent orchestration, in C# and Python public preview. The same Agent Framework code can run inside your own process today and be packaged as a Foundry Hosted agent later. The hosting integration lets you take any Agent or workflow and expose it through the Foundry Responses or Invocations protocol, deploying it as a containerized Hosted agent; for C# you install the Microsoft.Agents.AI.Foundry.Hosting and Azure.AI.Projects NuGet packages.

That last point reframes the whole decision. In-process self-hosted and Foundry Hosted are not opposite ends of a religious war. They can be the same orchestration code with a different runtime owner. That makes "start in-process, graduate to Hosted" a real migration path rather than a rewrite, and it lowers the stakes of getting the first call slightly wrong.

How Foundry Hosted agents actually run

If you have not deployed one, the Hosted agent runtime model is worth understanding because it drives several decisions below. Foundry Hosted agents are containerized agentic applications: you package your agent as a container image, push it to Azure Container Registry, and at deploy time Agent Service pulls the image, provisions compute, assigns a dedicated Microsoft Entra ID agent identity, and exposes a dedicated endpoint. The platform handles scaling, session state persistence, observability, and lifecycle management.

The runtime is per-session and sandboxed. Hosted agents run in per-session VM-isolated sandboxes with a persistent filesystem ($HOME and /files), which enables scale-to-zero with stateful resume. The design intent is short-lived idle sessions that suspend cheaply, with a hard session lifetime ceiling.

Hosted runtime specs at a glance (preview defaults, re-verify before sizing). From the Hosted agents concepts doc:

Sandbox sizes: 0.5 vCPU / 1 GiB, 1 vCPU / 2 GiB, or 2 vCPU / 4 GiB
Session idle timeout: 15 minutes
Session lifetime: permanently deleted after 30 days of inactivity
Concurrency: default 50 maximum active concurrent sessions per subscription per region (adjustable via Microsoft Support)

Bring-your-own framework is the headline. Hosted agents let you bring your own code in any framework - Microsoft Agent Framework, LangGraph, Semantic Kernel, OpenAI Agents SDK, Anthropic Agent SDK, GitHub Copilot SDK, or custom code (the last three via the Invocations protocol bridge). Language support is Python and C#, and the protocol libraries are framework-agnostic. If your team has standardized on LangGraph, you do not have to abandon it to get a managed Microsoft runtime.

Tools are opt-in, not automatic. Hosted agents reach Foundry-managed tools (Code Interpreter, Web Search, Azure AI Search, OpenAPI, custom MCP connections, A2A) through a Toolbox MCP endpoint provisioned in the project. The agent code connects to this endpoint using standard MCP client libraries, and the platform does not inject tools automatically. That is a deliberate design choice: you wire the tools you want rather than inheriting a fixed set, which is closer to how a code-first developer expects to work.

A few constraints to put in the architecture doc. Hosted agents are in public preview, billed on consumption of CPU and memory during active sessions plus per-call inference and tool usage. They support deployment within network-isolated Foundry resources and can use a customer-provided Azure Virtual Network for outbound traffic, but the Azure Container Registry holding the image must currently remain reachable over its public endpoint. If a fully air-gapped registry is a hard requirement today, that is a gap to confirm against current docs before committing.

The side-by-side comparison

This is the load-bearing table. Read it by dimension, not by column. The cross-links referenced in these rows live in the body and Read Next sections, not in the cells.

Dimension	Copilot Studio	Foundry Hosted agents	M365 Agents SDK (in-process / self-hosted)
Service model (CAF)	SaaS, no/low-code	PaaS, managed runtime (preview)	Self-hosted; you own and provision the compute (App Service, Container Apps, or your own host)
Who builds it	Makers, low-code, no developers required	Pro-dev (Python or C#)	Pro-dev (C#, JavaScript, or Python)
Tooling	Copilot Studio web app, drag-and-drop and natural language	Container image to ACR, azd provision/deploy	VS / VS Code with Microsoft 365 Agents Toolkit
Who owns the runtime	Microsoft (fully managed)	Microsoft runs your container	You host the orchestration process
Orchestration control	Fixed managed orchestrator	Bring your own orchestrator, full code control	Bring your own orchestrator, full code control
Framework / model choice	Managed models plus optional Azure AI integration	Any framework; Foundry model catalog	Model- and orchestrator-agnostic, any model
Runtime specifics	No infrastructure to manage	Per-session VM-isolated sandboxes, scale-to-zero, 15-min idle, 30-day session lifetime	ASP.NET host, storage you choose (Memory/Blob/Cosmos DB)
Identity	Automatic Entra Agent ID, cannot bring your own	Dedicated Entra agent identity per agent, automatic at deploy	Activity Protocol, Entra plus Agent 365 agentic identity
Governance surface	Power Platform admin center, DLP, Purview, ALM	Azure RBAC, content safety/XPIA, App Insights, BYO VNet	Your own Azure controls plus Agent 365
Distribution / channels	Web, mobile, Teams, Bot Service channels, M365 Copilot	Custom endpoints plus M365 Copilot and Teams natively	M365 Copilot, Teams, partner, mobile, web (widest reach of the three)
Extensibility / tools	Prebuilt + custom connectors, agent flows, MCP via connectors	Toolbox MCP endpoint, Foundry tool catalog, A2A	Bring your own tools; Teams SDK v2 adds A2A and MCP
Maturity	GA	Public preview	SDK; Agent Framework orchestrator in public preview
Best for	Fast departmental/enterprise agents without dev resources	Custom code and orchestration with Microsoft-run infra	Fine-grained model/orchestrator control, widest channel reach

A few cells deserve the prose that does not fit in a table cell.

On governance, all three paths converge in one place. Foundry, Copilot Studio, and other Microsoft agents surface in a single tenant-wide inventory: administrators view and govern them under Microsoft Entra admin center, Entra ID, Agent ID, All agent identities, where they can apply Conditional Access, Identity Protection, network access controls, and lifecycle governance. All three land in one shared Entra identity inventory you can see in one place, even though each platform keeps its own governance control plane. Shared visibility is not a unified control plane.

On channels, the practical difference is reach versus native simplicity. In Microsoft's custom-engine-agent comparison, the Agents SDK reaches Microsoft 365 Copilot, Teams, partner apps, mobile apps, and custom websites, while Foundry reaches Microsoft 365 Copilot and Teams natively, with other channels requiring custom integration. Copilot Studio agents can run across websites, mobile apps, Facebook, Microsoft Teams, or any channel supported by the Azure Bot Service, and can be published to Microsoft 365 Copilot. If your distribution requirement is "everywhere our customers are," the SDK and Copilot Studio reach further than Foundry's native channels.

How do you choose between Foundry Hosted, in-process, and Copilot Studio?

Run four gates in order and take the first hard answer: Gate 1 is who builds it (no coder means Copilot Studio), Gate 2 is who owns the compute (must own it means self-hosted SDK), Gate 3 is your code on managed infra (yes means Foundry Hosted), and Gate 4 resolves channels and protocols. Skip the feature checklist on the first pass.

Gate 1 - Who is building this, and do they write code? If the builder is a business maker or analyst with no developers attached, the answer is Copilot Studio. It is explicitly for makers building departmental, organizational, or external-customer agents with multi-step workflows, approvals, branching, autonomous capabilities, and Azure AI integration, all without code. If you do not have pro-dev capacity and the use case fits connectors and flows, stop here. You do not need Foundry or the SDK.

Gate 2 - Must you own the compute? If a hard requirement (a data-residency mandate, an existing hosting estate, a need to run inside a network boundary that Foundry's container registry constraint cannot meet) forces you to own the runtime, the Microsoft 365 Agents SDK self-hosted path is the answer. You bring your own orchestrator, host it yourself, and accept that you are responsible for compliance, security, and responsible AI of the hosted runtime. This is the most control and the most operational burden.

Gate 3 - Do you want your own code but not your own servers? If you have orchestration logic to write but no appetite to run the infrastructure, Foundry Hosted agents are the fit. You containerize the code, push to ACR, and Foundry handles scaling, identity, observability, and lifecycle. This is the path for a team that has outgrown Copilot Studio's fixed orchestrator but does not want to operate a 24/7 host.

Gate 4 - What is your channel and protocol requirement? If you need to reach the most surfaces (partner apps, mobile, arbitrary websites, the widest reach of the three) the Agents SDK reaches furthest. If you need custom protocols (webhooks, voice, AG-UI), background or async behavior, or agent-to-agent delegation, Foundry Hosted or self-hosted SDK beat Copilot Studio, which is strongest at connector-driven, conversational, in-Microsoft-365 scenarios. If your target is Microsoft 365 Copilot and Teams specifically, all three reach there, so this gate does not decide it; fall back to Gates 1 through 3.

For an enterprise platform plan, two trade-offs sit underneath these gates. The first is effort versus control: lowest effort and least pro-code control with Copilot Studio, managed code on Microsoft-run infrastructure with Foundry Hosted, and highest control with full ownership of compute on the self-hosted SDK. The second is maturity. Copilot Studio is GA. Foundry Hosted agents are public preview, and the Microsoft Agent Framework is public preview. For a workload you are putting into production this quarter, weigh "GA today" against "the right long-term runtime that is still in preview" honestly, and design the migration path either way.

Identity and governance: where the paths quietly differ

Identity is the dimension most teams underweight and then regret. All three paths get an Entra agent identity, but how much you control it differs.

Copilot Studio takes the automatic route with no override. Agents created in Copilot Studio can be configured to automatically receive a Microsoft Entra agent identity, an Entra service principal with an "Agent" subtype, when enabled at the Power Platform environment level. The constraint to note: Copilot Studio requires automatic management and does not allow bringing your own Agent ID or app registration. If your security model depends on pre-provisioning specific app registrations, that is a real limitation.

Foundry Hosted agents also automate identity but expose more of it to your code. Every Hosted agent gets its own dedicated Microsoft Entra agent identity and dedicated endpoint, both created automatically at deploy time. When integrated via Microsoft 365 channels it can use OAuth 2.0 On-Behalf-Of for user-invoked scenarios or its own agent identity for autonomous or background scenarios. Foundry agent identities support both attended (OBO, delegated) and unattended (client-credentials, application-only) authentication, with an automatic multi-step OAuth token exchange between Agent Service, Entra ID, and the downstream resource. The tools that currently support agent-identity authentication are MCP and Agent-to-Agent. That attended/unattended split is the architecturally important part: it is what lets one Hosted agent act as a user in a chat and as itself in a background job.

On the governance surface, the platforms diverge by design. Copilot Studio agents are governed through the Power Platform admin center with environment-level data loss prevention, role-based access, auditing, connector governance, ALM across dev/test/prod, and compliance via Microsoft Purview; publishing to an organization's app catalog requires admin approval. Foundry's enterprise capabilities, by contrast, include dedicated Entra agent identity, private networking, Azure RBAC, content safety guardrails that mitigate prompt injection including cross-prompt injection (XPIA), end-to-end tracing with Application Insights, and publishing to Microsoft 365 Copilot, Teams, and the Entra Agent Registry. Same destination inventory, different control planes.

Decide governance first

A Power Platform admin governs the Copilot Studio fleet; an Azure platform team governs the Foundry fleet. Decide who owns governance before you decide the platform, because the platform choice assigns the governance team.

One more useful convergence point: Copilot Studio custom engine agents integrate automatically with Agent 365 - an Agent ID is created automatically, the agent appears in the agent registry automatically, agents are pre-approved via Power Platform admin policies, and telemetry flows to the Agent 365 observability backend automatically. If you want the lowest-friction path to a governed, registered, observable agent, the low-code option does that wiring for you. The pro-code paths give you more control and ask you to wire more yourself.

Extensibility: tools and MCP across the three

Tooling is where the platforms feel most different day to day.

Copilot Studio extends agents through Power Platform connectors (hundreds of prebuilt plus custom/REST connectors), agent flows, and Model Context Protocol servers. Each connected MCP server appears as a tool offering its MCP tools and resources. Two details matter for governance: using MCP requires turning on generative orchestration, and MCP connectivity rides on Power Platform connectors, so the same DLP data policies that regulate connectors also regulate MCP access. That is a feature, not a bug, for a regulated org: your existing connector governance automatically covers your agent's MCP tools.

Foundry Hosted agents take the opposite stance: nothing is automatic, everything is explicit. Tools come through the Toolbox MCP endpoint and you consume them with standard MCP client libraries. That is more setup and more control. It also means your tool layer is portable, since it speaks plain MCP rather than a platform-specific connector model.

The Agents SDK leaves tools entirely to you. You bring your own, and the Teams SDK v2 adds A2A and MCP while the Agent Framework adds graph-based multi-agent workflows. If your agent's value is a bespoke set of internal tools that no connector covers, the SDK and Foundry Hosted both let you build them; Copilot Studio asks you to wrap them as connectors first.

This maps onto a pattern worth internalizing if you are building agents that should survive a vendor rotation: keeping tools and orchestration portable is exactly the discipline behind LLM-agnostic agent design, and the MCP-everywhere direction across all three platforms makes that easier than it was a year ago.

Where it breaks: caveats and the honest limits

Strong opinions, with the practitioner caveats attached.

Preview is preview. Foundry Hosted agents are public preview and the Microsoft Agent Framework is public preview. Limits and behaviors change. The runtime numbers in the specs list above (concurrency, idle timeout, session lifetime) are preview defaults as of mid-2026; re-check them before you size capacity, and assume the public-endpoint requirement on Azure Container Registry could shift. Just as important, no independent operational reliability signal yet exists for Hosted agents in production, so in my read you should weight this path as architectural direction, not proven reliability. Do not write a production runbook against preview numbers without a "verify current docs" line in it.

The "graduate later" path is real but not free. Yes, the same Agent Framework code can run in-process and later be repackaged as a Foundry Hosted container. That lowers the cost of starting in-process. It does not make the move a no-op: you still take on container packaging, ACR, the Toolbox MCP wiring for tools, and a different identity and networking model. Plan it as a migration with a test pass, not a checkbox.

Copilot Studio's ceiling is real. It is the fastest path and the right one for a large class of agents. But it is weaker at deep pro-code control over API calls, orchestration, and response formatting, its pro-dev tooling (source control, CI/CD, native Git) is less integrated than the project-based Agents Toolkit, and it supports Adaptive Cards schema 1.6 and earlier. The actual "outgrows Copilot Studio" failure mode, in my experience, is not a missing feature but orchestration reliability: as the instructions and the tool catalog grow, the managed orchestrator's behavior gets harder to keep predictable, and that ceiling shows up well before you hit any documented limit. If you can see the agent needing custom orchestration or bespoke response shaping within a year, starting low-code can become a rebuild. That is the most common expensive mistake in this decision.

Foundry's networking has a sharp edge today. Hosted agents support BYO VNet for outbound traffic, but the container registry must currently remain reachable over its public endpoint. For a strict zero-public-endpoint mandate, that is a blocker until it changes. Confirm against current docs rather than assuming it has been lifted.

Maturity claims need dates. As of mid-2026, Copilot Studio is GA and the other two paths have preview components. Microsoft's pace here is fast, so treat the GA/preview status as a snapshot, not a permanent fact, and re-verify before a go/no-go.

Pricing is not in scope here. Foundry Hosted agents bill on container compute during active sessions plus inference and tool usage, but the exact rates live on the Foundry pricing page rather than in the Learn docs cited here, and they were not retrieved for this article. Do not build a cost model from the qualitative description above; pull current pricing before you commit budget.

What you can reason about without rates is the shape of each cost. Copilot Studio bills on capacity or per-message, so cost tracks usage volume. Foundry Hosted is consumption on the active session and is cheap when idle because it scales to zero. The self-hosted SDK path is fixed 24/7 compute that you pay for whether the agent is busy or not, which is the operational weight behind the Gate 3 line about a team that "does not want to operate a 24/7 host." This is the author's framing, not a quoted price; confirm the actual numbers on the current pricing pages.

Which agent build path should you pick?

The decision is less about which platform is most capable and more about matching the service model to your team and your constraints. A maker team with a connector-shaped problem should not be writing containers. A pro-dev team with bespoke orchestration and a hard residency mandate should not be boxed into a fixed low-code orchestrator. And a team that wants its own code without running servers now has a genuine middle path that did not exist cleanly a year ago.

Run the four gates, document the answer as an architecture decision record, and note the migration path you are keeping open, because the same Agent Framework code spanning in-process and Foundry Hosted means you can defer the highest-stakes part of this call without locking yourself in. The platforms all feed one Entra agent inventory, so whichever you pick, the governance team can see it. Pick for who builds it and who runs it, not for the longest feature list.

Azure AI Foundry New vs Classic: 2026 Migration Map

Alex Pechenizkiy — Sun, 19 Jul 2026 14:48:26 +0000

There are now two Microsoft Foundry portals, and the toggle between them is a single banner switch labeled "New Foundry." That switch hides a real architecture decision: the new portal shows only Foundry projects, while a whole class of resources (Azure OpenAI, hub-based projects, managed-compute model hosting) lives exclusively on the classic side. The question every architect has to answer in 2026 is not "which portal looks nicer." It is "which of my resources are stranded on classic, and what work it takes to move them."

This article is the migration map. It separates what reached the new portal from what is still classic-only, lists exactly what transfers during a hub-to-Foundry migration and what you have to rebuild by hand, and flags the two hard dates that actually force your hand. It is current as of mid-2026 and grounded in Microsoft Learn. Where the docs hedge, I hedge too, because some of the most important questions (when does the classic portal retire?) do not have an answer in the current documentation.

A note on naming first, because Microsoft has changed it twice. The brand evolved from Azure AI Studio to Azure AI Foundry to Microsoft Foundry, and the AI services portfolio went from Azure Cognitive Services to Azure AI Services to Foundry Tools. Through all of it the underlying Azure resource type stayed Microsoft.CognitiveServices/accounts, as Microsoft documents in the navigate-from-classic guide. This article uses "new portal" and "current portal" interchangeably for the New-Foundry-on experience, and "classic portal" for New-Foundry-off.

TL;DR

There are two portals toggled by the "New Foundry" banner switch. The new portal shows only Foundry projects; the classic portal is required for Azure OpenAI resources, hub-based projects, and anything needing prompt flow or managed-compute (open-source) model hosting. The new resource model collapses the old Hub plus Azure OpenAI plus Azure AI Services sprawl into one Foundry resource with child projects on a single project endpoint. Two component-level dates are firm: the azure-ai-inference package retires May 30, 2026, and the Assistants API sunsets August 26, 2026. There is no announced retirement date for the classic portal itself, for hub-based projects, or for the old brand names. Most teams should default to a Foundry project; keep a hub-based project only when you specifically need prompt flow or open-source model deployments.

Monday move: inventory your resources by what is stranded on classic. List every standalone Azure OpenAI resource and every hub-based project, then check two things for each: do you have an azure-ai-inference or Assistants API dependency with a 2026 deadline, and do you need prompt flow or managed compute (the two capabilities that have no new-portal equivalent yet). That single table tells you what must move, what can stay, and what is on a clock. One more pre-flight check: verify Responses API and Foundry Agent Service region support before you migrate, because an unsupported region blocks agents in the new portal entirely.

What "New" Actually Means: A Resource Model, Not a Skin

The portal toggle is the visible part. The change underneath is the resource model.

In the classic model, a typical deployment managed multiple Azure resources at once: a Hub, an Azure OpenAI account, and Azure AI Services, spread across five or more endpoints. The new model collapses that into a single Foundry resource with child projects, accessed through one project endpoint, under one Azure resource provider namespace with unified RBAC, networking, and policies. Microsoft describes this consolidation in the what-is-Foundry overview. The resource type did not change. The number of things you have to wire together did.

This is why the new portal "shows only Foundry projects." It is not hiding your other resources to be annoying. The new portal is the surface for the new single-resource model, and everything that predates it (Azure OpenAI standalone resources, hub-based projects, the Hub resource type itself) is reached through the classic portal, per the classic what-is-Foundry page.

So before you compare features, get the vocabulary straight, because three terms do most of the confusing work.

Two Project Types, and the One Microsoft Tells You to Pick

The most consequential choice in this whole migration is not portal-level. It is project-level. There are two project types, and they are not interchangeable.

A Foundry project is managed directly under a Microsoft Foundry resource and requires no extra Azure resources. A hub-based project is hosted by a Foundry/AI hub, which in turn requires dependent Azure Storage and Key Vault resources. Microsoft's own guidance, stated plainly in the classic what-is-Foundry page, is: "In most cases, you want to use a Foundry project."

That recommendation is not marketing. It reflects where the feature investment is going. Microsoft notes that in June 2025 it began moving most of the Azure AI Hub's capabilities under the Foundry resource type, with new features primarily landing on Foundry, while select use cases such as open-source model deployments still require a hub resource, per the resource-types concept page. New generative-AI and model-centric features (the Foundry API and the Foundry Agent Service, both in general availability) are available only through the Foundry resource and its Foundry projects, as the migrate-project guide states.

But "most cases" is not "all cases," and the exceptions are sharp. Here is the capability split between the two project types.

Capability	Foundry project	Hub-based project
Agents	GA	Preview only
Foundry SDK and API	Full	Limited
OpenAI SDK and API	Native	Via connections only
Foundry Models (Azure OpenAI, DeepSeek, xAI)	Native	Via connections only
Evaluations	Preview	GA
Prompt flow	Not available	Available (hub-based only)
Managed compute / open-source model hosting (e.g. HuggingFace)	Not available	Available (hub-based only)
Extra Azure resources required	None	Azure Storage + Key Vault

Read that table the way the docs ask you to: as a snapshot, not a contract. Microsoft says feature parity between the two project types is explicitly not achieved yet ("Foundry projects feature set aren't yet on full parity") and points to a live support matrix rather than a parity-completion date. There is no stated date for when the gaps close. Treat the per-tool GA/Preview labels in the catalog as the source of truth and verify at the tool level, because some entries (agents being one) are described differently in different places in the documentation.

The two rows that should drive your decision are the bottom two. Prompt flow and managed compute (open-source model hosting) are exclusive to hub-based projects. If your workload depends on either, you are on classic for that workload, full stop, until the docs say otherwise. Everything else points toward a Foundry project.

The Portal Feature Map: New, Classic, or Both

Project type is one axis. The other is where a given capability is reachable. Microsoft's navigate-from-classic guide groups features into three buckets, and that grouping is the cleanest way to plan a migration.

New portal only, generally available: the Responses API, Agents v2 (which is the Responses API), a large and growing tool catalog (each entry carrying its own GA or Preview label, which you check per tool), and agent publishing to Microsoft 365 and Teams.

New portal only, Preview: multi-agent workflows, agent memory, Foundry IQ, hosted agents, the A2A protocol, and the Foundry Control Plane. Every item in this list is Preview. None of it should sit unguarded in a production path.

Both portals: Foundry projects themselves, chat completions, fine-tuning, evaluations (enhanced in the current portal), and the model catalog (expanded in the current portal).

Classic only, requiring migration: standalone Azure OpenAI resources (which you upgrade to a Foundry resource) and hub-based projects (which are not visible in the current portal at all, so you either switch to the classic portal or migrate them to Foundry projects).

The shape here is worth sitting with. The genuinely new generative-AI surface (agents v2, multi-agent, memory, Foundry IQ, A2A, Control Plane) is new-portal-only by design. The boring, load-bearing primitives (chat completions, fine-tuning, evaluations, the model catalog) work in both. And the two things that strand you on classic are an older resource type (standalone Azure OpenAI) and an older project type (hub-based). That is a healthy pattern: the consolidation is additive on the new side and migration-gated on the old side, not a forced rip-and-replace.

For the deeper "should I even be on Foundry versus standalone Azure OpenAI" question, that is a separate decision with its own tree. I worked through it in Azure AI Foundry vs Azure OpenAI: The 2026 Decision, and this article assumes you have already decided Foundry is your platform.

The API and SDK Rename You Cannot Ignore

If the resource model is the structural change, the API and SDK renames are the change that breaks your code. They are mechanical, well-documented, and unavoidable if you move to the new portal.

The terminology shifted across the board, per the navigate-from-classic guide:

Classic	New
Assistants API (Agents v0.5 / v1)	Responses API (Agents v2)
Monthly `api-version` params	`v1` stable routes (`/openai/v1/`)
Threads	Conversations
Messages	Items
Runs	Responses
Assistants / Agents	Agent Versions
`create_agent()`	`create_version()`

On the SDK side, the classic packages and the Azure-specific client collapse into the unified azure-ai-projects 2.x project client together with the standard openai OpenAI() client pointed at one project endpoint, per the same navigate-from-classic guide. The mapping:

Classic	New
`azure-ai-inference` (model inference)	`openai` package
`AzureOpenAI()` client	`OpenAI()` client with `base_url`
`azure-ai-generative`	`azure-ai-projects` 2.x project client
`azure-ai-ml` (hub-to-project scenarios)	`azure-ai-projects` 2.x project client
`azure-ai-projects` 1.x (classic portal)	`azure-ai-projects` 2.x (new portal)

The most common rewrite is the smallest one: drop the Azure-specific AzureOpenAI client and its azure_endpoint plus api_version for the standard OpenAI() client pointed at the single /openai/v1 project endpoint, authenticated with DefaultAzureCredential rather than an API key. This pattern follows the before/after example in the navigate-from-classic guide.

# Classic: azure-ai-projects 1.x era; Azure-specific client + monthly api-version
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://my-resource.openai.azure.com",
    api_key="my-key",
    api_version="2024-12-01-preview",
)

# New: pin azure-ai-projects 2.x; standard OpenAI client on the one project endpoint
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

project = AIProjectClient(
    endpoint="https://<resource>.services.ai.azure.com/api/projects/<project>",
    credential=DefaultAzureCredential(),
)
client = project.get_openai_client()  # OpenAI() bound to the ".../openai/v1/" base URL

The version split is the part that bites: azure-ai-projects 1.x targets the classic portal and 2.x targets the new portal, and mixing versions across portals causes errors. In my own experience this is the single most common stumbling block, more than any strategy question, and it is easy to miss because the symptom (a ModuleNotFoundError or an unexpected API response) reads like a code bug rather than a portal mismatch. Microsoft lists exactly that symptom and its "SDK version doesn't match your portal target" cause in the troubleshooting table of the navigate-from-classic guide. If you have ever debugged a "works on my machine" SDK mismatch, this is the same trap wearing a Foundry badge.

RBAC roles changed too. The old Cognitive Services OpenAI User and similar roles are replaced by Foundry User, Foundry Project Manager, Foundry Owner, and Foundry Account Owner, with control-plane and data-plane separation. One reassurance from the docs: these were recently renamed from Azure AI User/Owner/Account Owner/Project Manager, and the rename left role IDs and core permissions unchanged. So if you wrote Bicep or Terraform against the role IDs rather than the display names, the rename did not break you. If you hardcoded display names in scripts or documentation, update them.

What Are the Azure AI Foundry Migration Deadlines in 2026?

Two component-level deadlines are firm. The azure-ai-inference package retires May 30, 2026, and the Assistants API sunsets August 26, 2026. Everything else in the transition is soft: you can toggle portals freely, hub-based projects remain accessible on classic, and the docs announce no shutdown date for the classic portal itself.

Hard dates

The azure-ai-inference package retires May 30, 2026 - migrate to the openai package. The Assistants API sunsets August 26, 2026 - move workloads to the generally available Microsoft Foundry Agents service / Responses API. Both dates are stated in Microsoft's navigate-from-classic guide. Everything else in the classic-to-new transition is opt-in. These two are not. Both dates are component-level: they retire one SDK package and one API. Neither date retires the classic portal, hub-based projects, or the AI Hub resource type.

Read these carefully, because they are narrower than the rumors. Neither date retires the classic portal. Neither date retires hub-based projects or the AI Hub resource type. They retire two specific components: an SDK package and an API. If your codebase imports azure-ai-inference, you have a hard deadline. If you run the Assistants API in production, you have a hard deadline. If you do neither, these dates do not touch you, regardless of which portal you use day to day.

This distinction is, in my own analysis, where planning most often goes wrong. Teams read "classic" and "retiring" in the same paragraph and conclude the portal is sunsetting. It is not, at least not on any date Microsoft has published. The honest statement is: the only firm dates are component-level, and you should plan to them specifically rather than to a vague "classic is going away" anxiety.

What Transfers in a Hub-to-Foundry Migration (and What Does Not)

If you decide to move a hub-based project to a Foundry project, the mechanics are documented and reasonably quick, but the "what does not transfer" list is the part that will hurt if you skip it.

From a hub-based project, you create a new Foundry project on the existing Foundry resource. This requires the Owner role and takes roughly 5 to 10 minutes, per the migrate-project guide.

What transfers: model deployments, data files, fine-tuned models, assistants, and vector stores.

What does not transfer: preview agent state (messages, threads, and files), open-source and managed-compute model deployments (which are unsupported on Foundry projects), and hub-project access itself.

Item	Migrates to Foundry project?	Plan
Model deployments	Yes	Verify after migration; do not assume
Data files	Yes	Confirm presence in the new project
Fine-tuned models	Yes	Re-test inference paths
Assistants	Yes	But Assistants API sunsets Aug 26, 2026 - move to Responses API anyway
Vector stores	Yes	Re-validate retrieval after move
Preview agent state (messages, threads, files)	No	Recreate by hand; do not migrate mid-conversation
Open-source / managed-compute deployments	No	Unsupported on Foundry projects; keep on a hub
Hub-project access	No	Hub project is not reachable post-migration

Two of those "no" rows deserve a flag. First, preview agent state does not move. If you have agents mid-flight with live conversations, threads, and uploaded files, migrating loses that state. Plan the cutover for a quiet window and recreate, rather than treating it as a hot-path operation.

Second, open-source and managed-compute deployments are unsupported on Foundry projects. This is the same exclusion as the prompt-flow and managed-compute rows in the capability matrix earlier. If your hub-based project exists specifically to host an open-source model on managed compute, migration to a Foundry project is not a move, it is a deletion of the one thing that workload does. Keep it on a hub.

One more constraint that lives in the documentation and is easy to miss. The hub-to-Foundry migration article, along with the other hub-focused articles (hub RBAC, hub quickstart, hub resources overview), is labeled "Applies only to: Foundry (classic) portal" and explicitly states the steps do not work for Foundry projects in the new portal. So when you follow migration guidance, confirm which portal the doc is written for, because hub-focused and Foundry-project-focused docs are not interchangeable even when they describe the same conceptual task.

A Decision Framework You Can Defend in a Review

Here is how I would structure the classic-to-new decision for an enterprise platform plan. It is built from the decision dimensions Microsoft surfaces, ordered by how often each one is the actual deciding factor.

1. Do you need prompt flow or managed-compute (open-source) model hosting? If yes, you stay on a hub-based project in the classic portal for that workload. These two capabilities have no Foundry-project equivalent in the current docs. This is the first question because it is the only true blocker.

2. Do you have an azure-ai-inference or Assistants API dependency? If yes, you have a hard deadline (May 30, 2026 for the package, August 26, 2026 for the API). Migrate the dependency regardless of your broader portal strategy. These dates do not wait for your platform roadmap.

3. Are the new-portal features you want GA or Preview? Responses API, Agents v2, the tool catalog, and M365/Teams publishing are GA. Multi-agent workflows, agent memory, Foundry IQ, hosted agents, A2A, and the Control Plane are Preview. Gate every Preview capability out of production paths and design a fallback. Building production agent orchestration on Preview multi-agent workflows is a bet on a date Microsoft has not given you. Preview features are also typically excluded from SLAs and may fall outside Online Services Terms protections (verify the supplemental Preview terms against your current agreement), so gating Preview out of production is a contractual control, not just a stability precaution.

4. Is your target region supported? The Responses API and Foundry Agent Service are not available in every Azure region. Microsoft warns that if a Foundry resource sits in an unsupported region, agents and other Responses API features will not work in the current portal, and you must create a new Foundry resource in a supported region. Region-strand is not a theoretical edge case: Microsoft lists "Foundry resource is in a region that doesn't support the Responses API" as one of the documented common migration issues in the navigate-from-classic guide, which points at a dedicated feature availability across cloud regions reference (verify the exact supported-region list against current docs, since it moves). Verify region support before you migrate, not after.

5. Do your SDK versions match your portal? azure-ai-projects 1.x for classic, 2.x for new. Mismatched versions error out. This is a code-hygiene check, not a strategy question, but in my experience it strands more migrations than the strategy questions do, and Microsoft documents the mismatch symptom and fix in its troubleshooting table.

6. Do you still depend on standalone Azure OpenAI resources or hub-based projects? These are classic-only and require an explicit upgrade (Azure OpenAI resource to Foundry resource) or migration (hub-based project to Foundry project). They are not blockers, but they are work you have to schedule.

If the answers point you toward a Foundry project (and for most workloads they will, because Microsoft says so and the feature investment confirms it), the migration itself is the 5-to-10-minute operation described above. The framework exists so that the 5-to-10-minute operation is the last step, not the first.

Where It Breaks: Caveats and Open Questions

I would be misrepresenting the documentation if I gave you cleaner answers than the docs actually contain. Several important questions do not have published answers as of mid-2026, and you should plan around the uncertainty rather than around a guess.

There is no announced retirement date for the classic portal. The docs say new investment focuses on the new portal and that you can toggle freely, but they give no classic-portal shutdown date. Anyone telling you the classic portal dies on a specific date is reading something the current documentation does not say. Plan your migration on the value of consolidation and on the two real component dates, not on a phantom portal-retirement deadline.

There is no announced retirement date for hub-based projects or the AI Hub resource type. They remain accessible in the classic portal with no announced sunset. Given that prompt flow and managed-compute hosting are hub-only and have no new-portal equivalent, this is consistent: Microsoft cannot retire the hub while the hub is the only home for those capabilities.

The old brand names have no stated naming-retirement date. "Azure AI Studio" and "Azure AI Foundry" are described as superseded brands, but the docs give no date on which the names stop being used. This is cosmetic, but it matters for documentation and onboarding material that references the old names.

Feature parity between project types is not complete, and no completion date is stated. Microsoft explicitly says Foundry projects "aren't yet on full parity" and points to a live support matrix. Do not assume a capability you saw on a hub-based project is available on a Foundry project. Check the matrix.

Individual tool GA/Preview states should be verified at the tool level. The classic capability table lists agents as "Preview only" on hub-based projects, while agents are described as GA on Foundry projects elsewhere. The catalog carries per-tool labels, and the docs tell you to trust those over any summary table, including the ones in this article. When the stakes are production, read the live label.

Some new-portal availability is hedged in the docs themselves. For example, whether the Content Understanding GA API (2025-11-01) is yet available in the new portal is described as "will soon support," with no exact date. When the official docs hedge, do not firm it up in your architecture decision record. Carry the hedge forward.

API key authentication does not cover every surface in the new portal. Per Microsoft's new-portal GA overview, Foundry supports API key authentication for most areas, but agents, evaluations, the datasets tab, Content Understanding, and workflows require Microsoft Entra ID authentication (point-in-time as of the GA overview; verify against current docs). A team that scripted against API keys on classic will hit auth failures on exactly these surfaces. Plan an Entra ID with managed-identity path for them before you cut over, not after the first 401.

The pattern across all of these: the consolidation is real and directional, but the calendar is mostly empty except for two component dates. Build your plan on the firm dates and the GA/Preview gates, treat everything else as directional, and re-check the live matrix before you commit production load. The governance wrapper around all of this (who can create what, which Preview features are allowed where) is the same discipline I described in the AI governance framework for the Microsoft stack, and it applies cleanly to the new resource model because RBAC, networking, and policy are now unified under one resource.

Foundry New vs Classic: The 2026 Bottom Line

Three takeaways for the architect writing the 2026 platform plan.

First, default to a Foundry project, and document the exceptions. Microsoft says "in most cases, you want to use a Foundry project," the feature investment backs it up, and the new single-resource model is genuinely simpler than the old Hub-plus-OpenAI-plus-AI-Services sprawl. The only durable reasons to stay on a hub-based project are prompt flow and managed-compute model hosting. If neither applies, your default is a Foundry project, and the exceptions should be written down as exceptions.

Second, plan to the two real dates and ignore the phantom one. May 30, 2026 (the azure-ai-inference package) and August 26, 2026 (the Assistants API) are firm and component-level. The classic portal retirement is not a date, because there is no date. Most migration anxiety I see conflates the two. Separate them in your plan.

Third, the new-portal Preview list is long, and that is the real production gate. Multi-agent workflows, agent memory, Foundry IQ, hosted agents, A2A, and the Control Plane are the capabilities people get excited about, and every one of them is Preview right now. The GA surface (Responses API, Agents v2, tool catalog, M365/Teams publishing) is what you can build on today. Design for the GA line and treat the Preview features as roadmap, not foundation. If you are mapping how those agent patterns fit a broader architecture, the Microsoft agentic patterns playbook is the companion read.

The migration is happening either way. The question is whether you move proactively on the workloads that benefit and the dependencies that have deadlines, while leaving the genuinely classic-only workloads where they belong. Pick the project type per workload, document the choice as an ADR, verify region support before you cut over, and re-read the live capability matrix the week you migrate.

Ship a Real Website with Claude Code, GitHub, and Cloudflare (Cheap and Reliable)

Alex Pechenizkiy — Sat, 18 Jul 2026 14:45:03 +0000

The cheapest, most durable way to put a website on the internet has not changed much: publish static files to a CDN, version them in Git, and let a build run on every push. What has changed is who does the wiring. You can now describe the site to an AI coding agent and let it run the whole sequence, while you click the two things only a human can click. This is the stack I use to ship the sites behind this publication, written so you can hand it to your agent and get a live site on your own domain in an afternoon. The interesting part is not the tools, it is the division of labor: the agent runs the whole sequence, and you own only the two steps it genuinely cannot do.

TL;DR

The stack: Astro (static output) for the site, GitHub for source, Cloudflare Pages for hosting, pnpm for builds, and a tiny Playwright smoke test as the quality gate. The cost: about $0 per month plus the domain (roughly $8 to $15 per year), because static files on a CDN have no server to run or scale. The reliability: there is no origin to crash and no database to corrupt, so the common failure is a typo, which the smoke test catches. Monday move: install the four tools (Node, pnpm, git, the GitHub CLI), then paste the "Give this to your AI" prompt below to Claude Code and answer its questions. The only steps it cannot do for you are the two clicks inside the Cloudflare dashboard.

Why this is cheap and reliable

The reliability comes from the shape of the system, not from spending more. A static site is a folder of HTML, CSS, and a little JavaScript. Served from a CDN, it has no server process to patch, no runtime to crash, and no database to corrupt. Your repository is the source of truth, GitHub holds it, and Cloudflare rebuilds and ships the site on every push. The most common way to break it is a typo in your own content, and the test gate below catches the usual cases of that before they go live.

Layer	Choice	Why
Framework	Astro (static output)	Ships plain HTML and CSS with almost no JavaScript by default, so pages are fast and there is no server runtime. Content is Markdown or MDX in your repo.
Source control	GitHub (private repo)	Free, durable, and the thing Cloudflare watches. A bad change is one `git revert` away.
Hosting	Cloudflare Pages	Free tier includes a global CDN, automatic HTTPS, and Git-connected auto-deploy. Static hosting on a CDN is about as reliable as the web gets.
Package manager	pnpm	Fast, disk-efficient installs. npm or yarn work too.
Quality gate	Build check plus a Playwright smoke test	Catches broken builds, missing images, and dead links before they reach visitors.

Alternatives that also work: Cloudflare Workers if you need server-side logic, Vite with React, Svelte, or Vue if you want a single-page app, and Netlify or GitHub Pages instead of Cloudflare Pages. The principles carry over. This guide picks one good default path so your AI does not have to deliberate.

Prerequisites (one-time, about 15 minutes)

Three free accounts: a GitHub account, a Cloudflare account, and a domain name if you want a custom address (Cloudflare Registrar sells them at cost, which is the cheapest honest option). You can skip the domain and use the free *.pages.dev subdomain.

Four tools on your machine: Node.js LTS (version 20 or newer), pnpm (npm install -g pnpm), Git, and the GitHub CLI (gh, then gh auth login once, so your agent can create the repo without you clicking through the web UI). Plus your AI coding agent of choice.

Verify them:

node --version    # v20 or higher
pnpm --version
git --version
gh auth status    # should say "Logged in"

Give this to your AI

Once the prerequisites are in, paste the block below to your agent. Replace the bracketed values first.

I want to create and deploy a website. Use this stack and follow it exactly:

- Framework: Astro (static output)
- Source: a NEW PRIVATE GitHub repo named [my-site]
- Hosting: Cloudflare Pages, auto-deploy on push to main
- Package manager: pnpm
- Custom domain (optional): [yourname.com]

Site description: [one or two sentences about what the site is and who it is for].

Do this:
1. Scaffold a minimal Astro site with pnpm. Run it locally and confirm it builds.
2. Initialize git, create the private GitHub repo with the gh CLI, and push.
3. Add a publish gate: a "pnpm build" check and a minimal Playwright smoke test
   that loads the home page, checks the title, and verifies every <img> loads
   without a 404. Wire it so I can run it before every push.
4. Give me the exact Cloudflare Pages settings to enter (build command, output
   directory, framework preset) and the DNS steps for the custom domain. I will
   click those in the Cloudflare dashboard; you cannot do that part for me.
5. Write a short README with the local dev command, the deploy flow, and the
   gotchas list.

Keep it minimal and working. Do not add features I did not ask for. After each
step, tell me what you did and what I need to do next.

The one thing your agent cannot do is click inside the Cloudflare dashboard to connect the repo and the domain. Everything else it can run. The steps below are what it follows, and what you click.

The steps

1. Scaffold and confirm it builds

pnpm create astro@latest my-site
# Choose a minimal or blog starter, TypeScript "Strict", install dependencies
cd my-site
pnpm dev          # open the local URL; confirm the page loads
pnpm build        # produces the static site in ./dist
pnpm preview      # serves ./dist exactly as it will deploy

If pnpm build fails, fix it before going further. A green build locally is the contract Cloudflare relies on.

2. Put it on GitHub (private)

git init
git add -A
git commit -m "Initial site scaffold"
gh repo create my-site --private --source=. --remote=origin --push

That last command creates the private repo and pushes in one shot.

3. Connect Cloudflare Pages (you click this part)

In the Cloudflare dashboard, open Workers and Pages, click Create, then Pages, then Connect to Git. Authorize GitHub, select your repo, and set the build configuration:

Framework preset: Astro
Build command: pnpm build
Build output directory: dist
Node version: add an environment variable NODE_VERSION set to 20 (or newer) so the build matches your machine.

Click Save and Deploy. In a minute or two you have a live site at my-site.pages.dev. From then on, every push to main triggers a new deploy automatically.

4. Add your custom domain (you click this part)

In the Pages project, open Custom domains, click Set up a domain, and enter your domain. If its DNS is already on Cloudflare, the records are added for you. If not, Cloudflare gives you the exact CNAME or A records to add at your registrar, or you can move the domain's nameservers to Cloudflare (free) and let it manage DNS. HTTPS is automatic.

5. The reliability gate

A site that deploys is not the same as a site that works. Add a small gate so a broken build or a missing image never reaches visitors.

pnpm add -D @playwright/test
pnpm exec playwright install chromium

Create tests/smoke.spec.ts:

import { test, expect } from "@playwright/test";

const BASE = process.env.SMOKE_URL || "http://localhost:4321";

test("home page loads with a title", async ({ page }) => {
  await page.goto(BASE);
  await expect(page).toHaveTitle(/.+/);
});

test("no broken images", async ({ page }) => {
  await page.goto(BASE);
  const srcs = await page.locator("img").evaluateAll(
    (imgs) => imgs.map((i) => (i as HTMLImageElement).src).filter((s) => s.startsWith("http")),
  );
  for (const src of srcs) {
    const res = await page.request.get(src);
    expect(res.status(), `broken image: ${src}`).toBeLessThan(400);
  }
});

Run it against your local preview before every push. The habit is the point: build green, smoke green, then push. I learned the "no broken images" check the hard way, so it is not optional in my own setup. A broken <img> does not fail the build; the page just renders with a gap, and the smoke test is the only thing that catches it cheaply.

Cost, honestly

Hosting is $0 on Cloudflare Pages' free tier, which serves unlimited static requests and bandwidth, with a cap on the number of builds per month. Source control is $0 on GitHub, where private repos are free. The domain is the only guaranteed cost, roughly $8 to $15 per year depending on the extension. Everything else is free at this scale. If the site later needs a server, Cloudflare Workers plus a small database is the next step, and it stays inexpensive.

Gotchas (the things that actually bite)

These are real failure modes from running this stack. Tell your agent to watch for them.

Wrong build output directory. Astro outputs to dist. If Cloudflare is set to build or public, you deploy an empty or wrong site. Confirm dist.
Node version mismatch. Pin NODE_VERSION in the Pages environment variables so CI matches your machine.
Images that 404 silently. A broken <img> renders as a gap, not a build error. The smoke test's image check is what catches it.
Trailing-slash redirects. Static hosts often redirect /page to /page/ with a 308. This is normal. If you script checks against your own URLs, follow redirects (curl -L).
Secrets in the repo. Never commit API keys. Put them in .env, add .env to .gitignore, and set the real values as encrypted environment variables in the Cloudflare dashboard.
Committing node_modules or dist. Both belong in .gitignore. The scaffold sets this up; confirm it.
Pushing a red build. The whole reliability story depends on never pushing when pnpm build is failing locally. Build first, push second.

The mental model to keep

You are not running a server. You are publishing files. Your repo is the source of truth, GitHub holds it, Cloudflare turns it into a fast global site on every push, and a small test gate keeps obviously broken changes from going out. That is the entire system, and it is cheap and reliable precisely because it is so small. Hand the prompt above to your agent, answer its questions, click the two Cloudflare steps when it tells you to, and you have a real site on your own domain.

Microsoft Agent vs Flow: What Foundry's June 2026 Release Really Decides for You

Alex Pechenizkiy — Fri, 17 Jul 2026 15:07:55 +0000

The June 2026 Foundry release makes agents dramatically cheaper to ship. That is exactly the problem. The Microsoft agent vs flow question was already the most consequential architecture call a platform team makes, and this release raises the stakes by making the wrong answer easier to ship than ever.

Here is my position: most teams over-build agents for jobs a deterministic flow does better. The release lowers the cost of building an agent. It does not lower the cost of operating one, auditing one, or explaining one to a governance board. The release lowers the cost of shipping. That raises the cost of choosing wrong.

What actually shipped, and why Claude is not the headline

The Foundry June 2026 release covers a lot of ground: Claude reaching "general availability" in Foundry, agents publishing directly to Teams and Microsoft 365 Copilot, expanded Toolboxes and Routines, Memory updates, and Agent Optimizer in what Microsoft calls "private preview."

Before anything else, know which product you are actually comparing. Foundry Agent Service is the pro-code agent platform on Azure. Copilot Studio agents live in the Power Platform governance boundary. Microsoft 365 Copilot agents are the distribution surface inside Teams and Office. Power Automate flows are deterministic workflow automation. These four things get conflated in every planning meeting, and the conflation is where bad architecture starts.

Most coverage led with Claude's GA. I think that misses the point. Shipping a top competitor's flagship model at GA signals that Foundry is not betting on any single model winning. My read, and this is my stance rather than anything Microsoft claims: Foundry is positioning itself as the substrate agents run on, where distribution and governance are the moat and models are interchangeable tenants. Plan as if that is true, because the feature list only makes sense through that lens.

The headline is not Claude. The headline is that Microsoft just made model choice boring and distribution decisive.

Microsoft agent vs flow: the honest decision rule

Microsoft's positioning treats agents as the more capable option and flows as the simpler one. In production the relationship inverts. A Power Automate flow you can read line by line, with per-run history an auditor can replay, beats an agent nobody can fully explain. A Foundry agent earns its complexity only when the input is unstructured and the next step genuinely depends on reasoning.

Determinism is the first test, but not the only one. Ask about human approval gates, transactionality, failure tolerance, volume, connector coverage, and data residency. And ask about money. Power Automate bills on per-user or per-flow licensing, so cost is largely fixed against volume. Foundry agents bill on consumption, so cost moves with every run. Illustrative math with industry-standard inputs, calibrate against your own tenancy and rates, actuals vary: ten thousand invoice-routing runs on a per-flow license cost the same in January as in June. The same ten thousand runs through an agent, at a few thousand tokens per run, is tens of millions of tokens a month, priced against your model rates and swinging with prompt length, retries, and model choice. Neither number matters as much as the shape: one is flat, one is a curve nobody forecast.

Criterion	Power Automate flow	Foundry agent	Custom Azure build
Determinism	Fixed path, every run	Reasons over unstructured input	Whatever you code
Auditability	Line-by-line run history	Trace and explain	You build the logging
Latency	Milliseconds to seconds	Model round-trips	As fast as you engineer
Governance surface	Power Platform DLP	Foundry plus M365 admin	You own everything
Maintained by	Citizen dev or fusion team	AI team	Platform engineering
Cost model	Per-user or per-flow license	Consumption billing	Full run cost plus headcount
Failure mode	Visible error, run stops	Plausible wrong answer	Depends on your discipline

Read the failure-mode row twice. A flow that breaks throws an error and stops. An agent that breaks produces a confident, plausible, wrong answer and keeps going. That single difference should drive more architecture decisions than every capability slide from Build combined.

Three illustrative examples, not client work. Invoice approval routing: fixed path, mandatory audit trail, defined approvers. Flow wins, and putting an agent here adds probabilistic risk for zero benefit. Triaging unstructured support email into categorized tickets: free text in, judgment call out. Agent wins, wrapped in human review while you build confidence in its accuracy. A latency-sensitive fraud-scoring step inside an existing event pipeline: milliseconds matter and the data plane is bespoke. Custom Azure build wins, and no managed agent platform changes that.

Default to a flow. Escalate to an agent only when the flow's branching logic becomes unmaintainable, which is the honest signal that you need reasoning rather than more switch cases.

AI copilots vs custom Azure build: what Teams publishing changes

For years, the strongest argument in the AI copilots vs custom Azure build debate was distribution. Getting an assistant in front of users inside Teams meant bot registrations, manifest packaging, and admin negotiations that pushed teams toward custom front ends instead. Agents publishing directly to Teams and Microsoft 365 Copilot removes much of that work, though admin approval gates and tenant catalog policies still stand between "published" and "in front of users."

So when does a custom build still win? Three cases hold: bespoke data-plane requirements the platform will not accommodate, latency-sensitive paths where a model round-trip through managed infrastructure is unacceptable, and orchestration patterns the platform does not expose. If your custom-build rationale was distribution, it is now weaker. If it was control, it holds completely.

Published is not governed

An agent appearing in Teams does not mean it is governed in Teams. Before you ship, settle who owns Power Platform DLP policy, who monitors agent data handling through Microsoft Purview, and who manages agent lifecycle in the M365 admin center. Agents surface what existing permissions already allow. An oversharing problem you tolerated with search becomes an incident when an agent presents the same file conversationally.

I wrote about the guardrail layer this requires in Power Platform governance guardrails, and everything there applies double once agents enter the tenant. Distribution just got easy. Governance did not, and the gap between the two is now your problem, not Microsoft's.

Toolboxes, Routines, and Memory: the governance story hiding in the features

A sourcing caveat first: public documentation on Toolboxes and Routines is still thin as of this writing. Treat what follows as directional, and verify specifics against current Microsoft Learn docs before you commit an architecture to them.

Directionally, Toolboxes and Routines read like productivity features and are not. A Toolbox is a curated, scoped collection of capabilities an agent can reach, which means policy can live at the tool-grant level instead of being re-litigated per agent. A Routine is a declarative, reviewable definition of how an agent executes a task, which moves agent behavior toward something a change advisory board can actually reason about. Together they are Microsoft's answer to the single biggest objection enterprise IT raises against agents: unpredictability. What remains unconfirmed publicly is the RBAC model, versioning guarantees, and whether approval workflows are built in. Ask those questions before you standardize.

Adopt Routines before you scale agents, not after. Retrofitting predictability onto fifty deployed agents is a rewrite. Building it into your first three is a template.

Memory is the unglamorous feature that matters most, and it cuts both ways. Persistent context across sessions is what separates a chatbot from an assistant, and users notice immediately. But persistent memory is persistent data retention, and it creates four specific liabilities. First, retention itself: confirm where memory is stored, what the default retention period is, and whether expiry is configurable, against Microsoft Learn's data and privacy documentation rather than the release blog. Second, subject rights: a GDPR erasure request that touches agent memory needs a programmatic, auditable purge path, and you should prove that path exists before go-live. Third, reproducibility: explaining a past agent decision requires knowing what memory state existed at inference time, so confirm memory reads and writes land in your traces. Fourth, scope isolation: memory scoped wrong bleeds context between users, and that is a test case, not an assumption.

Treat everything an agent remembers as PII until proven otherwise. Marketing copy does not survive a Purview audit. Documentation does.

Agent Optimizer: watch list, not roadmap

Everyone ships agent builders. Almost nobody ships agent evaluators, which is why so many agents die between demo and production. Agent Optimizer, in private preview, looks like Microsoft acknowledging that gap: a harness for testing and tuning agent behavior rather than eyeballing it.

Private preview means unconfirmed. Do not plan Q3 around capabilities that can change or vanish before GA. For compliance evidence today, anchor on the GA evaluation tooling and tracing you can already run, which is exactly the discipline I argue for in evaluating agents before production. Put Optimizer on your watch list and revisit at GA.

The decision gate to run before you build

Do not end your next planning meeting with "let's build an agent." End it with this gate, in order, and stop at the first exit:

Is the execution path fixed and known? Build a flow. Stop here.
Does the input genuinely require reasoning over unstructured content? If no, it is a flow with more branches. If the branches have become unmaintainable, continue.
Can you tolerate a plausible wrong answer, or wrap the agent in human review until you cannot find one? If neither, it is not an agent candidate yet.
Do latency, data-plane, or orchestration requirements exceed what the platform exposes? If yes, it is a custom Azure build and you own everything that implies.
Before any agent ships: DLP owner named, lifecycle owner named, memory retention verified against Learn docs, erasure path proven, evaluation harness running on GA tooling.

The best production pattern I know is a deterministic wrapper around a probabilistic core: the flow owns the trigger, the approvals, and the audit trail, and the agent owns only the judgment call in the middle. Microsoft just handed you a faster way to ship agents. Run the gate, and what you choose not to build will matter more than what you do.

From Zero to Autonomous: An Agentic Development Workflow with Claude Code and Azure ML

Alex Pechenizkiy — Thu, 16 Jul 2026 15:28:07 +0000

Most Claude Code demos fall apart the moment you put them inside an enterprise. Not because the model is weak. Because the demo has no state, no guardrails, and no audit trail, and those three gaps are exactly what an enterprise cannot forgive.

The teams treating agentic development with Claude Code as a prompt-engineering problem are solving the wrong problem. Prompts are the easy part. The hard part is proving what an autonomous system did, why it did it, and how to undo it. That is an MLOps problem, and MLOps teams have been solving it for years.

An agent that writes code but cannot prove what it changed is not a productivity gain. It is a liability with a nice UI.

Here is the position this whole piece defends: if you cannot audit and reverse an agent's action, you do not have a pipeline. You have a risk waiting to be discovered in an incident review.

The blueprint: Claude as a first-class Azure ML compute step

Start with the constraint everyone tries to patch around instead of designing for. Claude Code is stateless per invocation. The Claude Code documentation frames each run as a fresh session, and people immediately reach for durable volumes to persist the CLI's internal session files.

Do not do that. Mounting the agent's internal session state couples your pipeline to undocumented CLI internals that can change without notice.

Make the git repo, the Azure ML artifacts, and explicit JSON hand-offs your source of truth instead. State lives in version control and pipeline I/O, not in the agent's memory. This is the single most important architectural decision in azure ml agentic development, and it is the one most tutorials skip.

Once you accept that, the wiring is clean. Claude becomes a named Azure ML pipeline component with declared inputs and outputs, not a side script someone runs from a laptop. It reads the repo, generates code, and registers artifacts (code bundles, candidate changes) in the model and component registry like any other run.

Every invocation gets a run ID, tracked inputs, and tracked outputs. The agent is just another reproducible step in a graph. That is the entire point.

Checkout repo The pipeline pulls a pinned git commit into the component workspace. The commit hash is the context anchor for the entire run.
Agent generates Claude Code runs as a scoped component step, reading the repo map and prior JSON hand-offs from a mounted datastore, then producing a code bundle as a tracked output.
Register artifact The generated bundle is registered in the component/model registry with the run ID, so the change is reproducible and attributable later.
Test gate The unit and integration suite runs against the candidate. Failures return as structured context for the next agent turn, not raw logs.
Promote A passing candidate advances to a staging scope and waits at a human approval boundary before it reaches production artifacts.

On the access path, treat direct API versus Bedrock as a trust-boundary decision, not a preference. The direct Anthropic API is one external dependency and a simpler compliance story. Claude on Amazon Bedrock means two clouds, added egress and data-residency complexity, and it only earns its place if your org already governs Claude centrally through Bedrock.

One thing worth checking before you architect around either path: Claude may or may not appear in your Azure AI Foundry model catalog depending on region and subscription. Verify it for your tenant rather than assuming.

The takeaway is blunt. Model the agent as a pipeline component with tracked I/O, or you lose reproducibility on day one and never get it back.

Tests as the control plane

Give the agent a reward signal it cannot argue with. In a claude azure ml pipeline, your unit and integration tests are that signal. Claude iterates against the Azure ML test gate until the suite goes green, instead of shipping confident first-pass output that nobody validated.

The tool-use and iteration patterns in Anthropic's docs make this loop mechanically simple. The discipline is in the boundaries you put around it.

Set an explicit iteration cap and a per-run token or cost ceiling so the loop always terminates. Feed failing tests back as structured context (which assertion failed, which file, which expected value), not raw log dumps. The agent fixes faster when the feedback is shaped.

Now the caveat that separates this from vendor cheerleading. A green suite proves the agent satisfied the tests. It does not prove the tests were right.

Goodhart's Law applies to your test gate

When a measure becomes a target, it stops being a good measure. An agent optimizing against a test gate will satisfy the tests, not necessarily your intent. Weak tests produce confidently wrong code that passes every check. Your suite is now the specification, so treat gaps in coverage as gaps in the spec.

This reframes what test coverage means. It is no longer just a quality metric. It is your autonomy budget.

Thin tests mean a short leash, because everything outside the tests is a place the agent can go wrong without anyone noticing until production. Rich tests mean you can safely let the loop run further before a human looks.

The takeaway: your test coverage is the length of the leash. Decide it deliberately, not by accident.

Governance that makes it shippable

This is where auditability stops being a slogan and becomes configuration. Give the agent a scoped principal, never a shared secret or someone's personal token. Managed identities and Azure RBAC let the agent run as an identity you can scope, log, and revoke.

Apply least privilege to registries and datastores. The agent writes to a staging scope. It does not write straight to production artifacts, ever. If a run goes wrong, the blast radius is a staging namespace, not your release channel.

Lineage is the other half. Azure ML tracks runs, and with MLflow tracking every agent action becomes logged, attributable, and reversible. That is the reversibility the opening promised.

But here is the gap you have to own, because the platform will not close it for you. There is no out-of-box mechanism that cryptographically distinguishes "the agent wrote this" from "the compute identity wrote this." There is no native versioning of the prompt or session that produced a given code change.

You build that. Custom tagging conventions on runs, the source commit and JSON hand-off captured as artifacts, and git-based review of the diff. This is an architecture responsibility, not a checkbox you tick.

Demo vs enterprise pipeline

A laptop demo skips authentication (it runs as you), skips lineage (nobody can prove what changed), and skips rollback (there is no staging boundary to reverse). An enterprise pipeline treats all three as non-negotiable. The demo optimizes for a fast first result. The pipeline optimizes for defending that result in an audit six months later.

Attribution and reversibility are the line between a demo and an ai agent ci/cd azure workflow you can defend when someone asks who changed the auth logic and why. Cross that line before you scale, not after.

Deployment with a human checkpoint that scales

Full end-to-end autonomy to production is the wrong goal today. Say it plainly, because the vendor marketing will not.

The right goal is autonomy up to a reviewable, batched checkpoint. The agent runs freely through generation, testing, and staging. A human signs off at the boundary that actually matters: the merge to production.

Azure ML supports this natively. Managed online endpoints with safe rollout and controlled promotion give you a real gate, not a vibe. Autonomy runs up to it. It does not run through it.

Batch the review so humans approve diffs and lineage together, rather than babysitting every step. A reviewer looking at a coherent change with its test results and lineage attached makes a better decision than one clicking approve forty times an hour.

Which is the caveat, and it has a name. Approval fatigue is not "reviewers get tired." It is automation complacency: as agent output volume rises and most of it is correct, reviewers calibrate toward trust and miss the rare consequential failure precisely because it looks like all the safe ones.

The fix is not "review everything." That guarantees complacency. The fix is risk-tiered gating.

Risk tier	Example change	Gate
Low	Docstrings, test additions, isolated refactors with full coverage	Batched auto-approve with post-hoc audit
Medium	New feature code within an established, well-tested module	Batched human review of diff plus lineage
High	Auth logic, schema migrations, security-critical paths	Mandatory individual sign-off, no batching

Concentrate human attention on blast radius. Put the human at the one boundary autonomy should never cross, and automate everything before it. That is the shape of a system that scales without quietly rubber-stamping its way into an incident.

Where this breaks and the honest limits

Every failure mode here traces back to the two design constraints already covered. Name them up front and gate for them, because they will happen.

Cost runaway comes from unbounded iteration on large tasks. This is the tests-as-control-plane point turned against you: without a hard iteration cap and cost ceiling, an agent chasing a green suite on a big change can burn through budget re-attempting the same fix. The cap is not optional.

Context-window drift comes from the statelessness constraint on sprawling monorepos. When the repo map exceeds what the agent can hold, it loses the thread and re-solves problems it already solved. The mitigation is the same JSON hand-off and datastore discipline from the blueprint. Scope the context to the change, do not feed the whole monorepo.

Then there are the tasks the test-gate model breaks on by design, not by accident.

Do not hand these to an agent yet

Untestable correctness (security posture, performance characteristics, architectural fit), irreversible changes (schema migrations, auth logic), and underspecified requirements are not edge cases. Tests encode known properties. These problems live in the unknown ones, so a green suite tells you nothing about whether the agent got them right.

For illustrative planning only: teams often model an iteration cap of three to five attempts per task and a per-run token ceiling before a human is pulled in. Treat those as industry-standard inputs; calibrate against your own data, actuals vary. The number that matters is the one your own test coverage and cost tolerance justify.

The failure modes are predictable, which means they are gateable. An unpredictable failure is a research problem. A predictable one is an engineering decision you either made or forgot to make.

If you want the upstream discipline this depends on, our write-up on building reproducible Azure ML pipelines covers the tracking foundation, and the piece on guardrails for AI agents in a CI/CD flow goes deeper on the iteration caps and risk-tiered gating referenced above.

The teams winning with agentic development claude code are not the ones with the cleverest prompts. They are the ones treating the agent as a governed pipeline component, with tests as its control plane and a human at the one gate that counts. Build that, and autonomy becomes an asset you can defend. Skip it, and you have automated the production of unexplained changes.

Rayfin Decoded: Microsoft's Bet on Prompt-to-Production Backends

Alex Pechenizkiy — Wed, 15 Jul 2026 15:20:57 +0000

A coding agent can write an app in minutes. Getting that app into production is still the slow part: a database to stand up, APIs to wire, authentication to configure, access policies to enforce, and infrastructure to own. That gap, between prompt-to-code and prompt-to-production, is where most agentic development quietly stalls.

At Build 2026, Microsoft put a bet on the table for closing it. Rayfin is an open-source SDK and CLI that lets developers and coding agents define a complete application backend in code, then deploy it directly into Microsoft Fabric with one command. No manual database setup, no API plumbing, no infrastructure work. The backend lands as a managed Fabric artifact, and its data lands in OneLake.

This is worth an architect's attention not because it is finished (it is a public preview), but because of which layer it is aiming at. If you read the developer-economics argument that coding agents compress execution but leave the deliver layer alone, Rayfin is the counter-move: Microsoft trying to compress deliver too.

TL;DR

What it is. An open-source SDK and CLI that defines a backend in code (data models, APIs, access policies, business logic) and deploys it straight into Microsoft Fabric.

The one-command pitch. A single deploy provisions the database, authentication, access policies, and APIs, with no manual setup or infrastructure work.

Where the data goes. App data lands in OneLake by default, immediately usable by Power BI, notebooks, and data agents with no pipelines in between.

The launch partner. Replit is the exclusive launch partner, so the build environment is an AI-first vibe-coding surface that deploys into a governed Fabric tenant.

The architect's catch. It compresses the plumbing, not the judgment. It also couples your app and its data to Fabric. Both of those are decisions, not defaults.

Confirmed vs commentary

Confirmed by Microsoft. Rayfin as an open-source SDK and CLI, the define-deploy-run workflow, the one-command provisioning of database, authentication, access policies, and APIs, data landing in OneLake, apps running as managed Fabric artifacts with inherited governance, and Replit as launch partner are all from Microsoft's Fabric announcement, linked inline. Rayfin was announced at Build 2026 on June 2, 2026, and is an early release (reported as a public preview).

Commentary (mine). The deliver-layer framing, the coupling and lock-in tradeoffs, the maturity caution, and the build-versus-wait guidance are my read of the documented mechanics, not Microsoft positions.

What Rayfin actually does

Per Microsoft's Fabric announcement, Rayfin runs a three-step loop, and the interesting part is what each step removes from your plate.

Step	What you (or the agent) do	What you no longer hand-build
Define	Specify data models, APIs, access policies, business logic, and connections to existing data sources in code via the SDK	Schema scripts, API scaffolding, auth wiring, policy config
Deploy	Run the CLI once	Database provisioning, authentication setup, API hosting, infrastructure
Run	Use the app	Pipelines to move data into analytics, and a separate governance story

The deploy step is the headline: one command, no manual setup or infrastructure work. The payoff is what happens next to the data. Whatever the app writes lands directly in OneLake, where Power BI, notebooks, and data agents can read it immediately, with no copies or pipelines in between. The application runs as a managed Fabric artifact, so it inherits Fabric's governance rather than carrying its own.

The launch surface matters too. Replit is the exclusive launch partner, which means the intended path is: build in an AI-first environment where an agent does the vibe coding, then deploy that app into a securely managed Fabric tenant. The agent writes the backend definition. Rayfin makes it real inside the data platform.

Why this is aimed at the layer that did not compress

The developer-economics piece on this site argued, following Brad DeLong, that agents compress execution while the decide and deliver layers resist automation. Deliver, in that framing, is everything between a working artifact and a governed thing running in production: provisioning, security, data plumbing, the operational surface.

Rayfin is a direct attempt to compress exactly that. It does not try to make the agent decide better. It tries to make the agent's output deployable without a human spending a week on glue. If it works at scale, it moves a real chunk of the deliver layer from bespoke human work to one command, which is genuinely new.

The bet in one line: move the deliver layer from bespoke human glue to a single command.

That is the optimistic read, and it is a fair one. It is also incomplete, because compressing the mechanics of deliver is not the same as compressing the judgment in it.

What it genuinely solves

Three things here are real and worth naming plainly.

Glue-code elimination. The unglamorous work of standing up a database, wiring auth, and hosting APIs is the part of agentic development that does not parallelize and does not delight anyone. Collapsing it into a deploy command is a real productivity gain, not a demo trick.

Inherited governance. Because the app runs as a managed Fabric artifact, it sits inside the tenant's existing governance rather than reinventing it. For a regulated Microsoft shop, an app that is governed by default is worth more than an app that is fast by default.

Analytics-ready data. App data landing in OneLake with no pipeline means the gap between operational data and analytical data closes to zero. The thing your app wrote a second ago is already queryable by Power BI and by Fabric data agents. For teams whose whole reason to exist is turning operational data into decisions, that is the feature.

What it couples you to

Now the architect's other eye. Every one of those benefits is also a coupling decision.

The benefit	The coupling it implies
Deploys straight into Fabric	Your backend now lives inside Fabric, not a portable runtime you control
Data lands in OneLake by default	Your application's system-of-record data sits in the analytics platform, with that platform's residency, access, and cost model
Inherits Fabric governance	Your app's governance is Fabric's governance, for better and for worse
One-command provisioning	The provisioning decisions are made for you, which is convenient until the day you need them to be different

None of these are reasons to avoid Rayfin. They are the questions to answer before you adopt it. An app whose data belongs in OneLake and whose lifecycle belongs to Fabric is a great fit. An app that needs to be portable across clouds, or whose system-of-record data should not live in the analytics estate, is not, and no amount of deploy-command convenience changes that.

There is also the plain maturity point. This is a public preview. The mechanics are documented and the direction is clear, but day-one preview reliability and the long-run cost shape of running production apps as Fabric artifacts have no track record yet. Treat the architecture as real and the operational maturity as unproven.

Deliver compresses, accountability does not

Here is the line an architect should hold. Rayfin compresses the mechanics of deliver. It does not compress the judgment in deliver.

The deploy command can	It cannot decide
Provision the database	Whether this app should own that data, or whether OneLake is its right home
Wire authentication	Who should have access, or who owns the incident when the policy is wrong
Stand up the APIs in one step	Whether the thing the agent built is the thing the business needed

So the same discipline the rest of this stack demands still applies. The agent drafts the backend definition. A human still decides whether it should exist, verifies that it behaves, and owns it in production. Rayfin removes the week of plumbing. It does not remove the architect.

Cheap deployment raises the value of judgment, it does not lower it. The cost of shipping the wrong thing just dropped, so the volume of things shipped is about to rise.

When to reach for it

A short, honest decision sketch.

Reach for it when you are building data-centric apps on Microsoft, your data belongs in OneLake anyway, you want governance inherited rather than rebuilt, and you can live with preview-grade maturity for now.
Wait when you need portability across clouds or runtimes, when the app's system-of-record data should not sit in the analytics platform, or when the workload cannot absorb preview-stage risk.
Either way, keep the human gates. Cheap deployment makes verification and accountability more important, not less, because more things will ship.

Rayfin is one of the more interesting answers anyone has shipped to the prompt-to-production gap, precisely because it does not pretend the gap is only about code. It treats deployment, data, and governance as the real work, which they are. The judgment about what to point it at stays exactly where it was.

Coding Agents and Developer Economics on the Microsoft Stack (2026)

Alex Pechenizkiy — Tue, 14 Jul 2026 15:18:48 +0000

Every few years a tool arrives that is supposed to end programming as a paid profession. FORTRAN was going to let scientists write their own code. COBOL was going to let managers read it. SQL was going to let business users query the database without a developer in the room. Each prediction was reasonable. Each was wrong in the same direction: demand for skilled builders went up, not down.

Brad DeLong's essay Coding agents as a continuation of normal software tool evolution puts AI coding agents in that lineage. His argument is not that agents are unimpressive. It is that they are normal. They compress the part of software work that was already getting compressed - the execution - while leaving the parts that resisted automation for seventy years exactly where they were: deciding what to build, verifying that it works, and holding the contextual understanding that makes both possible.

That framing matters more in the Microsoft business-apps world than almost anywhere else, because this is the stack where "anyone can build it" has been the marketing promise the longest. Copilot Studio, low-code Power Platform, makers shipping agents without writing a line of C#: the execute phase here is already close to free. So the question DeLong forces is the useful one. If building is no longer the constraint, what is the scarce skill an enterprise actually pays for in 2026?

TL;DR

DeLong's read on coding agents, translated to the Microsoft stack.

Agents heavily compress execution and assist with decide and deliver, but the accountable judgment at those ends does not transfer.

Programmer headcount grew by orders of magnitude from 1935 to 2025 as tools got more productive, not less (DeLong's figures: about 2,000 to over 2.5 million).

Cheaper agents mean more agents, which means more architecture decisions and more governance to own.

The durable, paid skill on this stack is decision quality, verification discipline, and accountability ownership - not build speed.

Monday move: for one agent you are about to ship, write down who decides what it does, who verifies its output, and whose pager rings when it is wrong in production. If those three names are blank, the agent is not ready, and no model will fill them in.

The headcount data is the whole argument

DeLong's strongest move is to put numbers on a thing people assert from intuition. Software tooling has gotten dramatically more productive across ninety years, and the population of people who do software work did not shrink. It multiplied.

He traces the count across ninety years:

Year	Role	Approximate headcount
1935	Calculator-tabulator machine wirers	about 2,000
1965	Coders	about 80,000
1995	Programmers	about 500,000
2025	Software developers plus programmers	about 2.25 million developers plus 250,000 programmers

The population that does software work did not shrink as the tools got more powerful. It grew by orders of magnitude. Even the conservative recent-window read, the thirty years from 1995 to 2025, is on its own roughly a fivefold rise. The assembler did not end the coder. The compiler did not end the assembler programmer. The high-level language did not end the compiler programmer. Each tool moved the work up a level and the number of people doing it grew.

This is the Jevons-paradox shape, and it is the part of the essay most worth internalizing if your career is on this stack. When a productive input gets cheaper, you do not automatically consume less of it. Often you consume far more, because uses that were previously too expensive to justify suddenly clear the bar. Cheaper software production did not mean less software. It meant software went into places no one would have funded a custom build for in 1995.

The Microsoft business-apps stack is living through exactly this right now. When a department can stand up a Copilot Studio agent in an afternoon, the answer is not fewer agents. It is more agents, in more corners of the business, owned by more people, touching more data. Every one of those agents is a decision someone made and an outcome someone is accountable for. That work does not compress. It multiplies along with the agents.

What actually compresses, and what does not

DeLong's clean line is that writing code was never the bottleneck. The constraints were always upstream and downstream of the keyboard: deciding what to build, verifying that the deliverable is correct, and maintaining the deep contextual understanding that lets you do either one well. Agents are very good at the middle. They are not good at the ends, and there is no strong reason to expect that to change soon.

It helps to name the three phases plainly and ask which one an AI agent genuinely takes off your plate.

Phase	What it is	Who owns it after agents arrive
Decide	Choosing what to build, for whom, against which constraint. Picking the agent, the data it sees, the boundary it must not cross.	Human-led. The agent can surface options, but it has no stake and no accountability for the choice.
Execute	Producing the artifact: the flow, the plugin, the prompt, the C# orchestration, the connector wiring.	Increasingly the agent, with a human in review. This is the phase that compresses.
Deliver	Verifying the result is correct, shipping it into production, and owning it on-call when it misbehaves.	Human-led. The agent can draft tests and checks, but verification and accountability do not transfer to a tool that cannot be held responsible.

On the Microsoft stack the three phases are concrete, not abstract. Decide is the architect choosing whether a problem wants a Copilot Studio agent, a Foundry-hosted container, or no agent at all. Execute is the build itself, and this is where Copilot, agents, and low-code tooling have made enormous progress. Deliver is the part that keeps people honest: who signs off that a Copilot Studio agent gives correct answers about pricing or eligibility, and who gets paged when it gives a wrong one to a customer at 2am.

That last question is the one the technology cannot answer. An agent can draft the plugin. It cannot be the name in the incident channel. Accountability is not a capability you can add to a model. It is a property of a person inside an organization, and it is precisely what an enterprise is buying when it hires a senior architect rather than renting more compute.

The job changes shape, exactly as it always has

DeLong's other useful observation is that the profession does not vanish under a new tool. It transforms. He contrasts the 1995 programmer, who translated specifications into code while managing memory and databases by hand, with the 2025 developer, who orchestrates tools and services across distributed systems and owns the result from design through deployment to on-call pager duty. Same profession, different centre of gravity. The hand-management of low-level mechanics fell away. The scope of ownership expanded.

The Microsoft business-apps practitioner has lived a smaller version of this transition more than once. The person who hand-wrote plugin registration steps and FetchXML by memory in 2012 is, if they kept current, the person designing solution-aware ALM, environment strategy, and governance guardrails in 2026. The low-level mechanics got abstracted. The ownership got bigger. AI agents are the next turn of that same wheel, not a different machine.

Then	Now
Hand-wrote the plugin, the flow, the query.	Specs the behavior, then reviews and verifies what the agent drafts.
Owned a feature inside an application.	Owns an agent's outcomes from intake through retirement, including its on-call.
Bottleneck was typing the code.	Bottleneck is deciding what is worth building and proving it is correct.
Scarce skill: knowing the API surface cold.	Scarce skill: judgment, verification discipline, and accountability.

None of this is a downgrade for the practitioner. It is the opposite. The work that compresses is the work that was always the least differentiated. The work that remains is the work that was always the reason a skilled person was in the room.

What should you hire a Microsoft AI architect for in 2026?

Hire for the work that does not compress: decision quality (judging which agents should exist and what they must not touch), verification discipline (proving an agent's output before trusting it), and accountability ownership (being the name on it in production). Build speed is the one axis a tool already wins, so it is the wrong axis to hire on.

This is the part that matters for anyone making or seeking a senior hire on this stack. If execution compresses and judgment plus accountability stay scarce, then hiring on build speed is hiring on the wrong axis. The candidate who can produce a working flow fastest is competing with a tool that is getting faster every quarter. The candidate who can decide which agent should exist, prove it behaves, and own it when it does not is competing with no tool at all.

Three things are worth paying for, in order.

Decision quality. The ability to look at a business problem and correctly judge whether it wants an agent, what that agent should and should not touch, and what the failure modes are before a line is built. This is the cheapest decision to get wrong and the most expensive to discover late. It does not show up in a coding test.

Verification discipline. The instinct to treat an agent's output as a draft to be proven, not a result to be trusted. On this stack that means eval datasets, deterministic gates, governance checks, and the refusal to ship a Copilot Studio answer to customers because it looked right in three manual tries. The more the build compresses, the more this matters, because the volume of things to verify goes up while the cost of producing them goes down.

Illustratively, that discipline is concrete on this stack. A Copilot Studio agent that answers pricing-eligibility questions gets an eval set and a gate before it ever faces a customer:

# Eval set for a Copilot Studio pricing-eligibility agent (excerpt, illustrative)
Q: Are Claude models on Azure covered by Founders Hub credits?    -> Expected: No
Q: Does a sponsored subscription with a card on file get charged? -> Expected: Yes
Q: Which regions deploy Claude on Foundry today?                  -> Expected: East US 2, Sweden Central
# Gate: a Power Automate test flow runs this set on every solution import;
#       a pass-rate below 100% fails the pipeline and the solution does not promote.
# Owner of record: a named architect. Audit: Managed Environment, DLP and log retention on.

The agent can draft that eval set. It cannot decide the threshold, hold the gate, or be the name on the incident. Those are the parts that do not compress.

Accountability ownership. The willingness and the standing to be the name on the agent in production. To define its success metrics, hold its release gate, and lead the response when it misfires. This is the scarcest of the three because it cannot be automated, outsourced, or faked, and because most organisations have not yet built the role that holds it.

A useful interview reframe falls out of this directly. Stop asking a senior candidate to build the thing faster. Ask them to tell you what should not be built, how they would prove the thing is correct once it exists, and who they think should own it on-call. The answers separate someone who can drive a tool from someone you can hand a production estate to.

The optimistic reading

DeLong's data points to a genuinely good outcome for skilled practitioners. More agents mean more decisions about which agents should exist, more architecture about how they fit together, more governance over what they touch, and more people needed who can judge all of it. Cheaper execution does not retire the architect. It raises the demand for the judgment that execution was never the point of.

Where the analogy has edges

The continuity argument is strong, and it is worth holding it honestly rather than as a comfort blanket. DeLong's own framing - agents as cranes or photolithography, automating heavy execution while humans keep supervisory control - is a claim about supervised execution, not autonomous judgment. It holds while a human stays in the loop on the decide and deliver ends. The open question for the Microsoft stack is how disciplined enterprises will be about keeping that human there as the agents multiply and the temptation to let them self-approve grows.

That is not a refutation of the thesis. It is the condition the thesis runs on. The headcount grew across ninety years because skilled judgment and accountability stayed essential at every tool transition. They stay essential through this one only if organizations choose to keep them in the loop. The architects who make that case, and who can fill the decide and deliver roles themselves, are the ones the data says will be in more demand, not less.

DeLong's bottom line, translated for this stack: the agent is the crane. Someone still has to decide where the building goes, sign off that it is safe to occupy, and answer the phone when something cracks. That someone is the hire. It always was.

What this does not mean

The continuity read is optimistic, not a guarantee. Hold the edges honestly.

Aggregate growth is consistent with individuals being squeezed out of the execute-only tier. The category grows; a given job is not safe by default.
Agents do draft evals, tests, and options. The human owns the gate, not the absence of agent involvement.
Cheaper execution is not free quality. Review burden rises with agent volume, it does not fall.
The historical pattern is correlational, not a law. Ninety years of growth does not guarantee the next ten.
The headcount-and-continuity read is DeLong's. The hire-for-decide-and-deliver prescription is mine, an extension of his data rather than his claim.

AB-100 Decoded (2026): What the Agentic AI Architect Exam Tests

Alex Pechenizkiy — Mon, 13 Jul 2026 16:24:11 +0000

On June 30, 2026, Microsoft retired four certifications at once: MB-335 and MB-700 on the Dynamics side, PL-500 and PL-600 on the Power Platform side. The Supply Chain functional consultant, the F&O solution architect, the RPA developer, and the Power Platform solution architect all reached end of life on the same day. And Microsoft's own Partner Center announcement names the successor in plain language: those certifications "have collectively been replaced by Agentic AI Business Solutions Architect (AB-100)."

To be precise about what that means: those four certs covered different job scopes, and AB-100 is not a one-for-one successor to any of them. It is the credential Microsoft now offers where the expert and specialist end of the business-apps track used to sit, and the retired tracks point to it rather than to four new specialist exams. That consolidation is the statement. If you held PL-600 as your credential of record, the upgrade path Microsoft provides runs through agents whether your current projects do or not.

I am preparing for this exam now, and the study guide turned out to be a more honest document about what this job is becoming than most of the keynote content on the subject. One calibration before the decode: the blueprint certifies knowledge of tooling that is itself young, and several of the capabilities it tests are weeks past general availability or still in preview, so treat it as a statement of direction as much as of settled practice. What follows is what is actually on the exam, what the weighting signals, the sharp edges I would want to know before test day, and how I would prepare with a working architect's calendar.

TL;DR

AB-100 replaces MB-335, MB-700, PL-500, and PL-600 as the credential where the expert tier of the business-apps track used to sit (successor by position, not a one-for-one scope match), and it is prerequisite gated: you must earn one of 14 associate certs, or passing the exam grants you nothing.

Deploy is the heaviest domain at 40 to 45 percent. The exam weights operating agents (monitoring, evaluation, ALM, security) above designing them, which matches what I keep seeing production teams learn the expensive way.

The center of gravity is Copilot Studio, not Foundry. Credit economics, orchestration modes, governance, and testing detail all live in the low-code tier. Foundry is the escalation path, not the default.

The most repeated lesson is when not to build an agent. For a certification with agentic in the name, a striking share of the blueprint is elimination logic: use code, a flow, or plain retrieval first.

What Is AB-100? The New Architect Exam and Its Prerequisite Gate

AB-100 is Microsoft's architect-level certification for agentic AI business solutions, spanning Microsoft 365 Copilot, Copilot Studio, Microsoft Foundry, Power Platform, Dataverse, and Dynamics 365. It sits at the top of the new AB series, requires an existing associate certification before it grants anything, and weights production operations above design.

It is also the exam Microsoft points retired PL-600 and MB-700 holders toward.

The mechanics, from the certification page: 100 minutes, proctored through Pearson Vue, may include interactive components, offered in English only, passing score 700, renewal by a free online assessment every 12 months. Pricing varies by country. There is a free practice assessment to calibrate against, and the skills outline receives a minor update on July 22, 2026, so work from the current study guide rather than a cached course.

The AB-100 prerequisite gate: 14 qualifying certs, and retired PL-600 or MB-700 do not count

AB-100 is prerequisite locked: you must earn at least one of 14 qualifying certifications, or passing the exam leaves you with a score report instead of a credential. The list, grouped (from the certification page's prerequisite section):

Dynamics 365 associates (8): Business Central Developer, Business Central Functional Consultant, Customer Experience Analyst, Customer Service Functional Consultant, Field Service Functional Consultant, Finance Functional Consultant, Supply Chain Management Functional Consultant, and Finance and Operations Apps Developer.
Power Platform associates (3): Power Platform Developer, Power Platform Functional Consultant, and Power Automate RPA Developer.
AI associates (3): Azure AI Engineer, Azure AI Apps and Agents Developer (currently marked beta), and AI Agent Builder (AB-620).

Read that list carefully, because it contains the catch that will surprise exactly the people this exam is aimed at: every qualifying cert is associate level. The retired expert-tier credentials themselves are not on the list, so holding PL-600 or MB-700 alone does not open the gate; what counts is a qualifying associate. Whether an expired associate still counts is exactly the ambiguity in the callout below. The one retired exam whose certification does appear is PL-500: the Power Automate RPA Developer Associate cert is on today's list even though its exam retired in June.

Before you book

The certification page says only that you must "earn" a qualifying cert. It says nothing about how long an earned prerequisite keeps counting or whether it must be unexpired. Check your qualifying cert's status on your Microsoft Learn profile first, and re-check the prerequisite list after the July 22, 2026 outline update, because the list itself could change that day.

Just as important is who should not take AB-100. Microsoft built the AB series with distinct rungs, and climbing the wrong one wastes months. Our Microsoft AI certifications hub maps the whole track; the short version:

Exam	Certification	Right person
AB-900	Copilot and Agent Administration Fundamentals	Admins and IT generalists running Copilot and agents in a tenant
AB-730	AI Business Professional	Business users applying AI at work
AB-731	AI Transformation Leader	Executives owning AI strategy
AB-620	AI Agent Builder Associate	Makers and developers building Copilot Studio agent solutions
AB-100	Agentic AI Business Solutions Architect	Architects designing and, above all, operating agentic solutions end to end

There is also a commercial forcing function. Per the same Partner Center announcement, four things change in July 2026:

The specialization is renamed to the Microsoft 365 Copilot specialization
The MS-102 certification requirement is removed
The old Applied Skills requirements retire at the end of June 2026
AB-100 plus AB-620 are added as new certification requirements

Partners who want that specialization have to staff these credentials, which puts a floor under demand that has nothing to do with individual career choices.

Independent cert-watchers reached the same reading of the wave: Vlad Catrinescu's 2026 retirements guide frames AB-100 as the recommended path forward for PL-600 and MB-700 holders and flags the partner-designation risk for firms that do not transition their certified staff.

AB-100 Exam Weighting: Why Deploy Beats Design

The exam has three domains. Plan sits at 25 to 30 percent, Design at 25 to 30 percent, and Deploy at 40 to 45 percent: the domain about running agents in production outweighs the one about designing them by a wide margin, and four of its subsections are monitoring, testing, ALM, and responsible AI with security and governance.

Domain weighting from the current AB-100 study guide; a minor outline update is scheduled for July 22, 2026.

I find that weighting quietly remarkable. The industry demo culture celebrates the design moment: the clever orchestration, the multi-agent diagram, the tool-calling loop. Microsoft's own architect exam says the job is mostly what happens after: whether you can tell a healthy agent from a degrading one, whether you have an evaluation practice instead of a vibe check, whether your deployment survives moving from dev to prod, and whether security holds when a malicious document lands in the agent's context.

Four Deploy-domain details, pulled from the official study material, illustrate the level it tests at:

Evaluation targets 80 to 90 percent, not 100. Microsoft's agent-evaluation checklist sets a realistic pass-rate band for probabilistic systems and tells you to run test sets multiple times. Copilot Studio ships seven built-in evaluation methods, and the details discriminate: the general-quality method needs no expected answer while every match and similarity method requires one, and evaluation results are retained for only 89 days.
A near-perfect score is a defect signal, at least in AI Builder. AI Builder's model grading treats grade D as double-sided: a prediction model can fail by being worse than random or by scoring 99 percent plus, which usually means an overfit model or a leaked column. The exam expects an architect who reads "99 percent accurate" on that report as a warning, not a win.
Monitoring is persona-matched, not one dashboard. The governance guidance routes makers to Copilot Studio analytics, developers to Application Insights, admins to the Power Platform admin center, and security teams to Sentinel. The retention traps are exam bait and production bait alike: reactions and comments keep 28 days, transcript downloads 29 days, analytics views 360.
ALM is a trap inventory. Solutions move metadata, never data. Copilot Studio keeps a documented list of settings that are not solution-aware and must be redone per environment, from Application Insights wiring to channel security. And fine-tuned model deployments bill hourly even at zero traffic and are deleted after 15 idle days while the model itself is retained. The exam wants the architect who knows where deployments actually break.

If you have been reading this site, that emphasis will sound familiar. The case for a standing evaluation practice and a per-run spend guard is not editorial preference anymore; it is what Microsoft's own architect exam tests.

What Microsoft Thinks an Agentic AI Architect Is

Studying the full blueprint, five signals stand out about the role Microsoft is actually certifying.

Copilot Studio is the center of gravity

The credit economics, the orchestration modes, and the evaluation and governance detail overwhelmingly live in Copilot Studio. The Cloud Adoption Framework strategy guidance the exam draws from is explicit about the build ladder: buy a SaaS agent if one meets the functional requirements, extend with low-code before building, and treat pro-code as the escalation rather than the default. Where Foundry does appear, the exam wants the boundary: which agent workloads belong in hosted versus in-process versus Copilot Studio. The architect this exam certifies is buy-then-extend-then-build, in that order.

The architect owns the money

Microsoft's own ROI guidance is exam material at the level of actual numbers. The agent business-value framework computes agent-assisted value as assisted hours times an hourly rate, defaulting to 72 dollars per hour based on U.S. Bureau of Labor Statistics compensation data, and it requires the baseline to come from telemetry rather than surveys. Treat the default as a starting point that flatters the case; substitute your own loaded rate before you show the math to a CFO. Consumption is priced in Copilot Credits, and the blueprint expects you to know the rate card, not just that one exists:

Metered thing	Credits
Prepaid capacity pack (monthly)	25,000 credits
Classic answer	1
Generative answer	2
Agent action	5
Tenant graph grounding query	10
Agent flow actions (per 100)	13

Enforcement behavior differs by feature (the sharp-edges table below carries the thresholds), and pay-as-you-go never enforces a stop, which is the same class of problem as the spend caps that do not actually cap. We keep a deeper teardown of the credits model in the Copilot Credits billing decode, and the ROI framing pairs with our Copilot ROI calculator piece. This is FinOps material, and the blueprint treats it as core architect knowledge rather than an afterthought for procurement.

The architect owns agent identity

The identity story splits by tier, and the split is the point.

Low code, per-agent identity. Every new Copilot Studio agent gets an auto-provisioned Microsoft Entra Agent ID, mandatory for new agents since July 2026. The identity doc's distinction is worth memorizing: the identity's scopes describe what an agent is configured to do, while access-control and DLP policies decide what it is allowed to do at the moment of execution, revalidated at every connector call. Conditional Access enforcement on agent identities currently applies only in the Teams channel, a scoping detail worth knowing before you promise it everywhere.
Pro code, per-project identity. In Foundry, all agents within a single project share the same managed identity, so least privilege forces a separate project per distinct access pattern. That is a wider blast radius by default than the low-code tier, and exactly the kind of detail that separates a working deployment from an audit finding.

On the attack side, the blueprint covers defense against indirect prompt injection in layers:

Prompt shields screening documents and other untrusted content
Spotlighting to separate instructions from data
Guardrails at the tool-call boundary
A human in the loop for consequential actions

Treat the stack as risk reduction, not prevention: assume an injection succeeds sometimes and design the blast radius accordingly. Status calibration belongs here too: the Agent ID platform went GA in April 2026, while several surrounding governance features are still in preview. My read of the balance: cost governance and agent identity are the blueprint's two obsessions, while data science barely appears.

Microsoft-governed does not mean Microsoft-only

The blueprint normalizes Salesforce, ServiceNow, and Zendesk as systems the agents serve, and Anthropic's Claude models appear inside the Azure model router and computer-use scenarios. Status matters here: the router itself is GA, while its Claude-model support is in preview, and Claude models must be deployed separately before the router can use them. The exam expects you to know multi-vendor mechanics, not just tolerate them.

The most repeated lesson is restraint

The planning guidance opens with elimination logic: structured, rule-bound work goes to code or non-generative automation; static question answering over a fixed corpus goes to classic retrieval; an agent is justified only for multi-step, adaptive, many-tool work. Agent flows are deterministic on purpose despite the name. Multi-agent designs are gated behind real conditions like security boundaries and separate owning teams. For a certification named agentic AI, the single strongest through-line is knowing when not to build one, which is the same lowest-rung-that-works elimination ladder we have argued for here, now with an exam number attached.

A Worked Example: An Invoice-Dispute Agent, the Way AB-100 Thinks

Here is the blueprint's instinct applied to a request every business-apps architect has heard some version of: "Finance wants an agent to handle vendor invoice disputes." Walk every slice down the elimination ladder before you accept the word "agent."

Three quarters of the request was never an agent.

The lookup slice is not an agent. "What is our late-payment policy for tier-2 vendors" is static question answering over a fixed policy corpus (in this stack, grounded knowledge sources in Copilot Studio or an Azure AI Search index). That is classic retrieval, grounded and cited, and building an agent for it buys non-determinism with no payoff.
The triage slice is not an agent either. Classifying an incoming dispute email and routing it to the right queue is one model judgment inside fixed steps. That is an agent flow: deterministic by design despite the name, billed per action through Copilot Studio consumption, no orchestration loop anywhere.
One slice is genuinely agent-shaped. Investigating a disputed invoice across the ERP, the vendor's email thread, and the contract terms, where the next query depends on what the last one returned, cannot be drawn as branches in advance. That slice earns an agent, with event triggers, least-privilege access, a human approval on any credit action, and audit logging, which is precisely how the blueprint scopes autonomous agents.
Multi-agent is gated, not default. The blueprint's conditions are concrete: a security boundary to respect, separate owning teams, planned growth. A dispute investigator and a payment approver with different data access qualify. Splitting one workload into five agents because the diagram looks impressive does not.

Then the Deploy domain takes over, because the exam's real question is not "can you design this" but "can you run it":

A labeled evaluation set with an 80 to 90 percent pass band, re-run on every prompt or tool change
Monitoring routed to the right persona per surface
Credit consumption modeled before the pilot, not after the invoice
Dev-to-prod through solutions, with the not-solution-aware settings on a per-environment checklist

That is the exam in miniature. Three quarters of the request was never an agent, and the quarter that was carries an operations bill the demo never shows.

AB-100 Sharp Edges Worth Knowing Before Test Day

A handful of product facts from the study material are both likely exam discriminators and genuinely useful at work. Each traces to current Microsoft Learn documentation, and GA-versus-preview status is tagged where it matters. Two rows deserve their sources up front. Computer use is GA, billed per step, and its hosted browser is documented as not for production use; the credit enforcement thresholds come from the same consumption doc as the rate card above.

Area	The detail that separates a pass from a guess
Model router	The model subset you configure doubles as the failover set. Configure a single model and you silently have no failover. Claude support is preview, and Claude models must be deployed separately first.
Evaluation (GA)	Seven built-in test methods in Copilot Studio; only the general-quality method works without an expected answer. Results retain 89 days.
Analytics retention	Reactions and comments 28 days, transcript downloads 29 days, analytics views 360 days. Long retention means exporting to Dataverse, not hoping.
AI Builder grading	Grade D flags both worse-than-random and 99-percent-plus models. Near-perfect accuracy usually means a leaked column.
Fine-tuned models	Storage is free, but a deployment bills hourly at zero traffic and is deleted after 15 idle days. The model survives; the endpoint does not.
ALM	Solutions move metadata, never data. App Insights settings, manual authentication, channel security, and sharing are not solution-aware and must be redone per environment.
Credits enforcement	Whole-agent disablement triggers at 125 percent of prepaid capacity, but agent flows block at 100 percent, and pay-as-you-go never enforces a stop.
Computer use (GA)	Billed per step, 5 credits standard and 15 premium; the hosted browser is explicitly not for production use.

None of these are trick knowledge. Every row is a production incident waiting for a calendar date, and I have watched close cousins of two of them: an analytics window that quietly expired before the retrospective that needed it, and an idle fine-tuned endpoint billing for weeks at zero traffic because nobody knew hourly billing survived zero usage.

How to Prepare for AB-100: A Three-Week Plan With a Day Job

The exam's own material points the way, and the weighting tells you where the hours go. Here is the plan I am running, sized for a working architect at roughly eight to ten hours a week over three weeks.

Week	Focus	Hours	Done means
1	Deploy domain: monitoring personas, the seven evaluation methods, ALM traps, responsible AI and agent identity	~10	One trial-tenant agent taken through evaluation runs, solution export, and import into a second environment
2	Design domain: orchestration modes and what each unlocks, first-party Dynamics 365 agents, extensibility boundaries	~8	You can say from memory which capabilities require generative orchestration and which first-party agent maps to which scenario
3	Plan domain plus calibration: CAF AI strategy, ROI and credit math, then the free practice assessment	~6	Practice assessment comfortably above the passing threshold before you book

Three notes on the plan:

Clear the gate first. Check the 14-cert list above against your Learn profile and confirm your qualifying associate cert is current. If you hold none, AB-620 or AI-103 are the natural on-ramps depending on whether you live in Copilot Studio or in code.
Study Deploy first, not last. It is the heaviest domain and the least demo-friendly, and hands-on beats video for it: a trial tenant with Copilot Studio, one agent taken through evaluation runs, solution export, and a second environment teaches the testing and ALM subsections better than any course. Work from the official study guide outline and read the linked product docs, because the discriminating details live there, not in summaries.
Mind the outline date. The skills outline gets a minor update on July 22, 2026, so pull the current study guide rather than a cached course. As of this writing, third-party prep courses for AB-100 barely exist, which cuts both ways: no shortcut courses, but also no stale ones. The primary docs are the actual source of truth.

What AB-100 Does Not Certify

Honesty about the boundary keeps the credential useful:

Not the retired certs' depth. It does not replace the domain depth of MB-335 supply-chain work, MB-700 F&O architecture, or PL-500 RPA engineering; those bodies of knowledge no longer have a dedicated exam.
Knowledge of operations, not operating experience. It certifies that you know the controls, evaluation methods, and traps exist, not that your evaluation sets, data hygiene, and spend guards actually hold in your tenant.
Not Azure platform architecture. Landing zones, networking, and workload architecture remain AZ-305 territory, which lives outside the business-apps track and is unaffected by this wave.
Not a project filter. Passing it does not make agents the right answer for your project mix; the exam itself spends its planning domain teaching the opposite.

The Verdict: Is AB-100 Worth Taking?

For architects who built careers on PL-600-era credentials, yes, with eyes open: AB-100 is the continuation of the role's credential line rather than an optional specialty. It is the only architect-tier credential Microsoft now offers where those four used to sit within the business-apps track. Whether employers and clients treat it as required is a market outcome that will take a year to observe; what is already fixed is the partner-specialization requirement and the absence of an alternative in this track. The content itself is a fair, occasionally blunt statement of what the role now requires: agent restraint at planning time, product-boundary fluency at design time, and operations, evaluation, cost control, and security discipline above everything else.

One caveat belongs next to the credential: the exam tests that you know the operational controls exist, not that yours work. Pair it with a standing evaluation set, honest baselines, and a spend guard, or it certifies a vocabulary.

That said, the emphasis is the encouraging bit. The exam does not certify enthusiasm. It certifies the unglamorous operational competence that separates the agent that survives contact with production from the one that becomes next quarter's writedown. If your instinct on agentic AI has been to ask what could go wrong before what could be automated, this exam was written for how you already work.

Exam facts verified against Microsoft Learn on 2026-07-11. AB-100's skills outline receives a minor update on 2026-07-22; check the study guide for the current version before booking.

How to Align Claude Code With Your Codebase: 6 Techniques (2026)

Alex Pechenizkiy — Sun, 12 Jul 2026 14:49:19 +0000

A coding agent does not invent a house style. It absorbs yours. Point Claude Code at a repo where every feature scatters its own LLM calls, mixes data access into the UI, and skips tests, and the agent will produce more of exactly that. It is not being lazy. It is doing what it was asked to do: extend the codebase in the codebase's own idiom.

That is one of the most useful things to understand about working with coding agents. As the team at Towards Data Science put it, the coding agent will just follow the natural pattern in your codebase. Your existing habits, good and bad, become the agent's defaults. Alignment is the work of closing the gap between what you want, what the agent builds, and what the project actually needs.

TL;DR

Coding agents imitate the patterns already in your repo, so bad structure perpetuates itself. To align Claude Code with your intent: refactor toward the pattern you want copied, use plan mode before any implementation, load the full context including constraints like cost, encode recurring corrections as project memory, mechanize hard rules as hooks so they cannot be skipped, and run independent reviewer agents to catch what same-session review misses. This works the same whether the agent writes Power Platform solutions, Dataverse plug-ins, Azure infrastructure-as-code, or content.

What does it mean to align a coding agent with your codebase?

Alignment is closing the gap between three things that rarely match on their own: what you envision, what the agent builds from your prompt and the repo's patterns, and what the project actually needs. You close it with context, encoded rules, and gates the agent cannot skip, not with a smarter model.

Every agent-assisted task involves three mental models that rarely match:

What you envision. The design in your head, including the constraints you never said out loud.
What the agent implements. Its best reading of your prompt plus the patterns it found in the repo.
What the project actually needs. The correct answer, which sometimes differs from both of the above.

Misalignment is the distance between these three. The dangerous case is not when the agent produces something broken. Broken code announces itself. The dangerous case is when the agent produces something technically correct but contextually wrong: code that compiles, passes the obvious tests, and solves a problem you did not actually have. That output looks finished, so it slips through review, and you discover the mismatch only after it is wired into three other modules.

Incomplete specifications are the usual cause. If you ask for "a service that calls the model and returns a summary" without mentioning that the workload runs on a per-call budget, the agent will reach for the most capable model and the richest prompt it can justify. Correct against the words. Wrong against the wallet. The fix is rarely a smarter agent. It is more context, supplied earlier.

Model	Source of truth	Failure mode
What you envision	Your intent, often unspoken	Constraints stay in your head
What the agent builds	Your prompt plus repo patterns	Imitates bad structure faithfully
What the project needs	The actual correct design	Nobody states it explicitly

Six Techniques You Can Apply This Week

These are the moves that close the gap. None of them require a new model or a plugin. They are discipline, encoded so the discipline does not depend on you remembering it.

1. Refactor toward the pattern you want imitated

If you want the agent to write clean data access, give it one clean example first. The agent generalizes from what it reads far more reliably than from what you describe. Telling it "keep concerns separated" is weaker than showing it one module where the model call lives behind a single typed interface and the rest of the code depends on that interface, not on the SDK.

This is leverage, not housekeeping. Spend an hour refactoring the one file the agent will treat as the reference, and every subsequent generation inherits the better shape. In a Microsoft shop the same rule holds: if your first Dataverse plug-in puts business logic, retrieval, and tracing all in one Execute method, the next ten the agent writes will look the same. Establish the seam you want copied, then let the agent copy it.

2. Use plan mode before any implementation

Claude Code's plan mode lets the agent investigate the repo and propose an approach without writing code. Use it as a contradiction detector. Describe the goal, ask the agent to read the relevant files, and have it tell you where your idea collides with what already exists.

This is where unspoken assumptions surface cheaply. The agent will say things like "the existing flow assumes synchronous calls, but this design needs a queue" or "there is no place to inject this config without touching the shared client." You resolve those in conversation, for free, before a single line is committed. Skipping plan mode does not remove the contradictions. It just defers them to the diff, where they cost an implementation pass to unwind.

3. Load the full context, including the constraints

Agents implement against the context you give them, so withholding context is the same as misleading them. The constraints you forget to state are the ones that cause rework.

Bring the whole picture into the session: the relevant meeting notes, the chat thread where the real requirement was argued out, the architecture doc, and the limits. Cost is the constraint people most often omit, and it is the one that quietly produces expensive solutions. Latency budgets, compliance boundaries, the model tier you are allowed to call, the data that must never leave a region: state them up front. An illustrative example, kept generic on purpose: a team asks an agent to add document summarization and gets back a design that calls a premium model on every page load. Correct to the prompt, wrong to the budget, and entirely avoidable by saying "this runs on a tight per-request cost ceiling" in the first message.

4. Encode recurring corrections as project memory

The second time you correct the agent on the same thing, stop correcting and start recording. Claude Code reads a CLAUDE.md file at the start of every session, and that file is your standing instruction set. House rules belong there: naming conventions, the libraries you have standardized on, the patterns you have banned, the commands the agent should run before declaring work done.

For durable behavioral rules that accumulate over time, this content platform keeps a synced agent-memory directory alongside CLAUDE.md, with an index file and individual rule files that the agent loads on every session. Each rule was a correction once: a phrasing the brand does not use, a verification step that must run before publishing, an asset that must never be deleted during cleanup. Captured as memory, a correction becomes a default. You can edit this memory directly or through the /memory command. The point is that a rule written down once stops costing you attention forever.

Alignment is a loop: each correction either gets encoded as memory or mechanized as a gate, so it never has to be made twice.

5. Mechanize hard rules as hooks and gates

Memory shapes behavior, but it does not enforce it. An instruction in CLAUDE.md is a strong suggestion, and a strong suggestion can be missed under a long context or an ambiguous prompt. For the rules you cannot afford to have skipped, move from suggestion to mechanism.

Claude Code supports hooks, configured in settings.json, that run on events such as before a tool call (PreToolUse). A hook is ordinary code, so it can inspect the action and block it. On this platform there is a publish gate built exactly this way: a PreToolUse hook intercepts any attempt to push a site repo and refuses unless the pre-flight check has passed. That check is deterministic. It:

verifies the frontmatter is complete,
confirms the hero image and its WebP companion both exist,
checks that every internal link resolves,
and scans for forbidden characters.

The agent cannot talk its way past it, because the gate is not part of the conversation. It is part of the harness. The hook is a few lines wired into settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "node framework/hooks/git-push-gate.mjs" }
        ]
      }
    ]
  }
}

The referenced script reads the proposed command on stdin and, when it is a git push from a site repo with no fresh pre-flight pass, returns a deny decision. The push simply does not happen.

This is the difference between hoping a rule holds and knowing it does. Pair it with default-deny permissions: settings.json lets you allowlist exactly the commands the agent may run without prompting, so anything outside the list stops and asks.

{
  "permissions": {
    "allow": ["Bash(pnpm build)", "Bash(git status)"],
    "deny": ["Read(./.env)", "Read(./.env.*)"]
  }
}

The combination, a permission policy that defaults to deny plus hooks that mechanically block unsafe actions, turns your most important rules from advice into guarantees. The same approach maps directly onto a Microsoft pipeline: a PreToolUse gate that runs pac solution check and refuses to deploy a Power Platform solution while the checker still reports errors, or one that blocks an az deployment apply until the plan has been reviewed.

6. Use independent reviewer agents

Same-session review is weak review. The agent that wrote the code is primed by everything it just did, so it tends to confirm its own work. To catch what that misses, fan the review out to separate agents that arrive without the writing context.

On this platform a saved review workflow runs several critic subagents in parallel, each with a single narrow lens - one checks that every claim traces to a source, one reads the draft as the target reader, one looks only for passages that need a diagram - and a synthesis step merges their findings into one ranked list. Subagents are first-class in Claude Code: each gets its own context window and its own instructions, so a fresh reviewer evaluates the artifact, not the conversation that produced it. The general pattern for code is the same. A security-focused reviewer, a test-coverage reviewer, and a "does this match the spec" reviewer, each independent, will surface issues that a single self-review never raises.

Why Coding-Agent Alignment Works the Same for Power Platform and Azure

This platform is a content system, but nothing above is specific to writing. The alignment discipline is identical whether the agent is producing a blog post, a Dataverse plug-in, an Azure Bicep module, or a Power Automate flow.

The agent inherits your repo's habits in every one of those cases. Scattered model calls in a TypeScript service and business logic crammed into a plug-in's Execute method are the same failure: a structure the agent will faithfully reproduce until you refactor the example it learns from. The unspoken cost constraint that produces an over-engineered summarizer is the same unspoken constraint that produces an over-provisioned landing zone. And the rule you keep repeating - run the checker, never push without the gate, never invent a documentation slug - is the same kind of rule whether it guards prose or infrastructure.

So the durable lesson is mechanical, not motivational. Do not rely on the agent to remember your intent, and do not rely on yourself to restate it every session. Codify intent where the agent reads it, and codify the rules you cannot compromise on as checks the agent cannot skip. Memory clarifies the contract. Gates enforce the part of it you cannot afford to leave to chance.

Dimension	Hope-based alignment	Gate-based alignment
Where intent lives	In your head, restated each session	In CLAUDE.md and project memory
Recurring corrections	Repeated by hand every time	Encoded once as a rule
Hard rules	A strong suggestion in a prompt	A PreToolUse hook that blocks the action
Permissions	Broad, agent runs most commands	Default-deny allowlist in settings.json
Review	Same session confirms itself	Independent reviewer subagents
Failure visibility	Found after it ships	Caught before the push

Start Small, Then Mechanize

You do not need the full harness on day one. The progression is natural and each step earns the next:

Start with plan mode and full context. They cost nothing and prevent the most expensive misalignments.
Refactor the one file you want imitated, so the agent learns the shape you actually want.
The next time you correct the agent on the same thing twice, write it into CLAUDE.md.
When a rule becomes genuinely non-negotiable (never push without passing checks, never run a destructive command unprompted), promote it from memory to a hook so it stops depending on anyone's attention.
When the work gets high-stakes, add independent reviewers.

Each step moves a piece of your intent out of your head and into the system, where it holds without you. That is what alignment is: not a smarter agent, but a clearer contract, written where the agent will actually read it and enforced where it actually runs.

Copilot Credits Went Live: What Work IQ and Cowork Actually Cost

Alex Pechenizkiy — Sat, 11 Jul 2026 14:45:35 +0000

On June 16, 2026, two Microsoft announcements landed on the same day, and most coverage treated them as separate stories. Work IQ reached general availability with consumption billing, and Copilot Cowork became generally available. They are not separate stories. They are the same event: Microsoft turned on a single metered currency, Copilot Credits, across the Copilot Studio, Work IQ, and Cowork surface, and the meter started running the moment the announcements went live.

If you run Copilot Studio agents or have Frontier participants who can switch on Cowork, you now have a consumption line item that did not exist on June 15. This article is the cost decode: what the meter actually charges, where the real money is (it is not where the headline number points), and the governance switches an architect should set before the grace period closes on July 1. One calibration up front: the billing mechanics below are documented and live, but the day-one reliability of the underlying grounding and the quality of Cowork's output have no production track record yet. Treat the architecture as confirmed and the maturity as unproven.

TL;DR

One currency. Copilot Credits now bill Work IQ, Cowork, and Copilot Studio from a single consumption pool. Pay-as-you-go list price is $0.01 per credit.

The headline rate is a decoy. Work IQ charges a static 0.1 Copilot Credits ($0.001) per API call for actions, but the variable grounding and reasoning charge is where the overwhelming majority of the cost lives. The static charge is under 1 percent of Microsoft's own scenario ranges of $0.20 to $1.50 per call.

Work IQ MCP needs a Microsoft 365 Copilot license. Consumption is billed on top of that license, not instead of it.

Cowork is off by default and carries a grace period through July 1, 2026 for organizations with Frontier participants. Turn on cost controls before you turn on the feature.

The governance surface is real. Admins can block servers tenant-wide, set spending limits per tenant, group, and user, and trace every tool call in Microsoft Defender. Set these first.

Confirmed vs reported vs commentary

This article separates three kinds of claim so you can apply your own evidence threshold.

Confirmed by Microsoft. The June 16 GA date, the Copilot Credits currency, the $0.01 per credit pay-as-you-go rate, the 0.1 credits per API call static charge, the Light/Medium/Heavy scenario ranges, the Microsoft 365 Copilot license requirement for Work IQ MCP, Cowork's off-by-default state, and the July 1 grace period are all sourced to Microsoft Learn, the Microsoft 365 blog, and the Microsoft licensing newsroom. Links are inline.

Arithmetic. The "200x to 1500x" gap between the static and variable charge is my calculation from Microsoft's own published numbers, shown inline so you can check it.

Commentary (mine). The "headline rate is a decoy" framing, the fan-out cost model, the pre-flight checklist, and the build-versus-wait guidance are my interpretation of the documented mechanics, not Microsoft positions. Read them as such.

One currency, three surfaces

Until this month, the agent-cost conversation was fragmented. Copilot Studio had message packs. Work IQ was preview, unbilled. Cowork was a Frontier experiment. As of June 16, Microsoft's licensing newsroom describes Copilot Credits as "a unified consumption currency that also covers Copilot Studio and other Microsoft AI services." The Work IQ MCP documentation uses nearly identical language: Copilot Credits are "the common currency across Copilot Studio capabilities and Work IQ protocols across your tenant."

That consolidation is the actual news. One pool, one pay-as-you-go rate of $0.01 per credit, one admin dashboard, drawn down by every agent capability you switch on. The two products that went GA this week are the first two taps on that pool.

Work IQ: the two-part meter

Work IQ is the intelligence layer that grounds Microsoft 365 Copilot and custom agents in organizational context. Per the Work IQ MCP overview, it is built on three layers - Data, Memory, and Inference - and exposed to agents in Copilot Studio as MCP servers that return real-time context from email, calendars, and chats. The servers available today:

Work IQ Mail
Work IQ Calendar
Work IQ Teams
Custom servers you publish

The billing has two components, and the difference between them is the whole story. From Microsoft's licensing page, Work IQ API charges consist of:

A static component: "0.1 Copilot Credits per API call" for actions and tools.
A variable component for queries: grounding, retrieval, and reasoning.

At $0.01 per credit, the static charge is $0.001 per call. That is the number that traveled fastest, and it makes the feature sound nearly free. It is also almost irrelevant. The variable component is unbounded by comparison, and Microsoft's own consumption scenarios show where the real money sits.

Scenario	Example prompt	Price per call
Light	Identify action items assigned to me by my manager and compile them into a checklist	$0.20 to $0.40
Medium	Review the latest customer interview emails, identify top themes and roadmap impact, recommend three prioritized actions	$0.30 to $0.75
Heavy	Produce Level 1 and Level 2 summaries from the latest roadmap executive review using recent meetings and documents	$0.50 to $1.50

Now do the arithmetic. The static charge is $0.001 per call. The Light scenario bills $0.20 to $0.40 per call. That is 200 to 400 times the static charge. The Heavy scenario, at up to $1.50, is 1500 times the static charge. The 0.1-credits-per-call headline describes a rounding error sitting on top of a grounding-and-reasoning bill that is two to three orders of magnitude larger.

The number that matters is the one Microsoft did not put in the headline

If you budget Work IQ at "0.1 credits per call," you will under-forecast by 200x to 1500x. The static tool charge is trivial. The variable grounding-and-reasoning charge is the line item. Model your spend on the scenario ranges, not the per-call rate.

This is the architectural reality the per-call rate hides: an MCP server rarely makes one call. A single agent turn that reads mail, checks a calendar, and reasons over both fans out into multiple Work IQ calls, each carrying its own variable charge. The meter does not tick once per user request. It ticks once per tool invocation, and agents are built to invoke tools liberally.

Cowork: a different product on the same meter

Copilot Cowork is the other June 16 GA: an agentic system that executes long-running, multi-tool tasks end to end and is positioned to return finished work rather than drafts, though that output still needs human verification. What it brings:

Models: Anthropic Opus 4.8 and Sonnet 4.6 today, with more coming.
Plugins: partner plugins from firms like Harvey, Moody's, and S&P Global.

For cost purposes, the facts that matter:

License: requires a Microsoft 365 Copilot User Subscription License.
Meter: billed through Copilot Credits at the same $0.01 per credit pay-as-you-go rate.
Cost drivers: model use, context retrieval, tool calls, and runtime.
Default state: off by default.
Grace period: Frontier-participant organizations are not billed until July 1, 2026.

So Cowork and Work IQ are different products with different jobs, drawing from the same credit pool, gated by the same license, and governed by the same admin dashboard. If you reason about them separately you will miss the combined burn rate on a shared budget.

How much does Work IQ actually cost per call?

Microsoft's published scenarios run $0.20 to $1.50 per call, not the $0.001 static rate: a Light task costs $0.20 to $0.40, a Medium task $0.30 to $0.75, and a Heavy task $0.50 to $1.50. The static 0.1-credit charge is a rounding error. The variable grounding-and-reasoning charge is the bill.

Here is a worked example using only Microsoft's published numbers. Using illustrative volume assumptions - a 200-person team where each person triggers ten Medium-scenario interactions per working day, across 21 working days - at the midpoint of the Medium range ($0.525 per call):

Per person per day: 10 calls at $0.525 = $5.25
Per person per month (21 working days): $110.25
Per 200-person team per month: $22,050

That is one team, one moderate workflow, before Cowork runtime, before any Heavy scenarios, before the multi-call fan-out that turns one user request into several billed calls. The pay-as-you-go meter has no ceiling of its own. The only ceiling is the spending limit you set.

Why this resembles the Azure Marketplace billing trap

Readers of the Claude on Azure marketplace billing trap will recognize the shape: a metered third-party-style cost that runs against a payment method by default, with the per-unit rate looking harmless until volume and fan-out compound it. The lesson is the same. Read the consumption model before you ship, and set the guardrails before the meter starts, not after the first invoice.

The governance surface is the actual product feature

The good news is that Microsoft shipped real controls alongside the meter, and they are where an architect should spend the first hour. Per the Work IQ MCP documentation and the Work IQ APIs announcement:

Spending limits can be set per tenant, per group, and per user across agents and services, from the Microsoft 365 admin center cost-management dashboard. This is the hard ceiling the pay-as-you-go model otherwise lacks.
Server-level allow and block is tenant-wide. If an admin blocks a Work IQ MCP server, it blocks access for every user and every agent. "Permissions always take precedence over configuration."
Pay-as-you-go versus prepurchase is an admin choice. Prepurchase plans exist for organizations that want a committed, capped pool rather than an open meter.
Observability runs through Microsoft Defender Advanced Hunting, where every tool call can be traced: which tool, which parameters, which outcome. That is your audit trail and your cost-attribution data in one place.

Set these before July 1

Decide pay-as-you-go versus prepurchase before anyone switches a feature on. Prepurchase caps the pool.

Set tenant, group, and user spending limits now, while spend is zero.

Block every Work IQ MCP server you have not explicitly approved. Default-deny, then allow.

Confirm Cowork stays off until you have a budget and an owner for it.

Wire Defender Advanced Hunting queries for tool-call volume so cost attribution exists from day one.

A meter on plumbing that is still settling

The billing is live and precise. The integrations it bills for are not always. Power Platform MVP Jukka Niiranen, testing Copilot Cowork in the Frontier preview, reported the agent stumbling on real systems:

Dataverse MCP server tools that had been deprecated without Cowork being told.
A Dynamics 365 Sales plugin that appeared in the configuration UI but was not actually available to the system.

His framing is that MCP is sold as the USB for AI but behaves more like USB-C, with so many variations that you cannot be sure a given combination will work until you try it. When it works, he notes, it is genuinely impressive.

That is the maturity caveat the billing announcement does not carry. As of GA you are metered per tool call on integrations that can silently deprecate or fail to resolve. The cost is deterministic from day one. The reliability is not. Budget for both, and do not assume that paying for a tool call means the tool was actually there.

Where this fits the larger Copilot Credits economy

This week is the opening of a longer arc. Copilot Studio's own agent metering moves onto the same credit currency, which means the questions buyers ask about Studio pricing, capacity packs, and autonomous-agent consumption are now the same questions you ask about Work IQ and Cowork. One budget, one set of guardrails, one cost-attribution model across the Copilot Studio, Work IQ, and Cowork surface.

The teams that will be calm in three months are the ones that treated June 16 as a governance event, not a feature launch. The meter is on. The license gate is real. The per-call rate is a decoy. Set the spending limits, default-deny the servers, and forecast on the scenario ranges, not the headline.

What this does not mean

Five bounds so the takeaways do not over-reach.

Copilot Credits do not replace the per-seat Microsoft 365 Copilot license. Work IQ MCP requires that license, and consumption bills on top of it.
This meter is not the whole Microsoft AI stack. Azure AI Foundry agents bill on Azure consumption, and GitHub Copilot is a separate license and meter. The Copilot Credits pool covers Copilot Studio, Work IQ, and Cowork.
A spending limit caps total spend and hard-stops agents when reached. It does not lower the per-call grounding-and-reasoning charge.
Default-denying servers controls access and cost. It does not make the agents more reliable.
Prepurchase caps the pool. It does not remove the variable grounding charge inside each call.

Microsoft's Agents Hub Decoded (2026): The Frameworks and the Gaps

Alex Pechenizkiy — Fri, 10 Jul 2026 16:18:45 +0000

Microsoft just gave agentic computing a front door. The new Agents hub on Microsoft Learn pulls the company's agent guidance into one place: principles for architecting agent solutions, an agent archetype framework, an adoption maturity model, and evaluation guidance. It is Microsoft's official map, and a good one, for building agents on its own stack (Copilot Studio, Foundry, and Azure), not vendor-neutral agent theory.

For an architect, the interesting question is not what is in it. It is what the map formalizes, and what it quietly leaves to you. Official guidance from a platform vendor is always two things at once: a genuinely useful synthesis of hard-won patterns, and a document that cannot, by its nature, tell you when not to build the thing it is teaching you to build. This is a read of both halves, so you can use the hub for what it is good at and bring your own discipline where it stops.

TL;DR

The maturity model is the most immediately useful piece. It is a Capability Maturity Model for agents. Use it to locate yourself honestly: in the engagements I see, most teams sit at Level 100 to 200, not 400.

The 3Cs archetype framework is a shared vocabulary, not new theory. Its value is killing the "every team reinvents agent design" tax, which is real on large programs.

The evaluation guidance is the part to read twice. Percentage-based evaluation over pass/fail, baselines before builds, and an explicit warning against averaging scores. That is mature thinking.

What it leaves to you: the cost-runaway failure mode, the call to not build an agent, and the real failure stories. The hub is maturity-forward and aspirational by design; the operational scar tissue is still your job.

How to use it: assess your maturity honestly, scope with the 3Cs, evaluate from day one, and bring your own spend guardrails. The official guidance will not tell you where the meter runs.

What is in the Microsoft Agents hub?

The hub is a Microsoft Learn front door to the company's agent guidance, and it organizes into four substantive pieces, better than a typical docs landing page.

Architecting agent solutions: principles and patterns, anchored on a "fit for purpose" idea that AI should deliver value at an appropriate level of complexity rather than maximal sophistication.
The agent archetype framework: the "3Cs" model, categories, capabilities, and components, a structured way to design and talk about agents.
The agentic AI adoption maturity model: a five-level, five-pillar assessment of where an organization sits on the agent journey.
Evaluation frameworks: how to design and operationalize agent evaluation as a continuous practice.

Three of these are worth an architect's real attention. One is worth reading twice.

The maturity model: a CMM for agents, and an honesty test

The adoption maturity model is explicitly based on the Capability Maturity Model, the same lineage as decades of software-process assessment. It runs five levels, from Level 100 (initial, experimental, individual-dependent) to Level 500 (an optimized, agent-first enterprise), across five capability pillars: AI strategy and experience, business strategy, AI governance and security, technology and data, and organization and culture.

The useful thing here is not the ladder. It is the honesty the ladder forces. Most organizations running agents today are somewhere between Level 100 and Level 200: a few pilots that worked, no repeatable practice, governance that lives in one person's head. The model makes that visible, and it makes visible that maturity is not a technology problem. Four of the five pillars are strategy, process, governance, and culture. Only one is technology and data.

Use it as a mirror, not a roadmap

The temptation with any maturity model is to read it as a to-do list and declare yourself Level 400 because you bought the tools. The honest use is the opposite. Score each pillar against what is actually repeatable in your organization, not what is possible in the product. The gap between "we can run an agent" and "we can run agents safely at scale" is exactly the four non-technology pillars, and that is where most programs stall.

The 3Cs archetype framework: a vocabulary tax, removed

The agent archetype framework is candid about what it is. In Microsoft's own words, it "doesn't introduce new concepts. It names and structures what experienced builders already do." It organizes agent design into three layers, the 3Cs: categories (the why), capabilities (the what), and components (the how). The top layer names seven categories of agent behavior:

Connect - gather and integrate information across the enterprise
Analyze - turn gathered data into insight
Create - produce and transform content
Act - take action on a user's behalf
Automate - orchestrate multistep workflows
Govern - embed compliance into agent behavior
Monitor - improve through telemetry and feedback

Do not undervalue a shared vocabulary because it is not novel. On any program with more than a couple of teams, the absence of one is a real and compounding cost: teams reinvent the same agent, share inconsistent guidance, and cannot build on each other's work. The 3Cs give a consulting practice or an internal CoE a common way to scope, describe, and reuse agent designs. That is the same reason consistent architecture patterns matter for cloud solutions, and it is worth more on a ten-team program than any single clever pattern.

The limitation is the flip side of the strength. A vocabulary tells you how to describe an agent. It does not tell you whether the agent should exist. That decision lives outside the framework.

The evaluation guidance: read this part twice

The evaluation guidance is the most mature thinking in the hub, and the part most teams will skip and most regret skipping. A few points from Microsoft's common evaluation approaches stand out as genuinely good architecture advice, not marketing.

Move from pass/fail to percentages. Because language models are nondeterministic, static pass/fail unit-test thinking does not fit. Evaluation has to be percentage-based, measured against a baseline.
Establish a baseline first, even a manual one. Microsoft's example is honest: existing ticket routing does not have a 100 percent success rate even with humans. You evaluate the agent against the real baseline, not against perfection.
Resist the averaged score. The guidance explicitly warns against collapsing evaluation into a single radar-plot average, and says to select agents for the one or two qualities the use case actually needs. That is a non-obvious, correct point that a lot of teams get wrong.

This section also concedes something important: agents disrupt traditional ROI and feasibility frameworks, because value spreads across multi-agent and tool ecosystems rather than a single process. That is true, and it is the seam where the official guidance stops and your own work begins.

What the hub leaves to you

Here is the architect's other eye. The hub is comprehensive on how to design, assess, and evaluate agents. It is thinner, by design, on three things that decide whether an agent program survives contact with production.

The hub is strong on	It leaves to you
Maturity assessment across strategy, governance, tech, culture	Where you actually are this quarter, scored honestly against repeatable practice
A shared vocabulary for designing and describing agents	Whether a given agent should be built at all, or solved without one
Evaluation as a continuous, percentage-based practice	The real cost-runaway failure mode and the real-time guardrails for it
An aspirational path toward an agent-first enterprise	The failure stories, and the discipline to stop before the failure

Cost governance is the conspicuous gap. The maturity model has a governance and security pillar, and that is right. But the failure mode that actually produces a shock invoice, an agent looping overnight with no real-time spend cap, is an operational concern that maturity language does not catch. Now that Copilot Credits and provider meters bill per tool call, an architecture review that does not include a per-agent spend guard is incomplete. That is the subject of the spend-caps governance piece, and the reason I wrote an open-source guardrail to sit in the gap the official guidance leaves.

The "do not build it" decision is yours. A framework that teaches you to design agents has a structural bias toward designing agents. The hub's own "fit for purpose" principle gestures at restraint, but the call to solve a problem without an agent, with a flow, a query, or nothing at all, is judgment the document cannot make for you. It is also the single most valuable thing a senior architect contributes.

The failure stories are not in the docs, and never will be. Official guidance is maturity-forward. It describes the climb to Level 500, not the specific way a Level 200 pilot blows its budget by April or ships a confidently wrong answer to a customer. Those are the receipts that make architecture decisions real, and they live in practice, not in a hub.

How to use the Microsoft Agents hub this week

A short, honest plan that takes the hub for its strengths and supplies the rest.

Score your maturity against repeatable practice, not product capability. Be willing to write Level 100 or 200 where it is true. The gap you find is your roadmap.
Adopt the 3Cs as your team's shared language if you run more than two agent teams. It pays for itself in reduced rework before it pays for itself in anything clever.
Stand up evaluation before you build, with a real baseline and percentage targets. This is the hub's best advice. Do it on the first agent, not the tenth.
Add a per-agent spend guard and a documented kill path to every architecture review. The official frameworks will not prompt you to; the meter does not care that you reached Level 400.
Keep the "should this be an agent" question on the table through design. The framework will happily help you build something you should not have built.

Microsoft drew a good map. Maps are most useful to people who already know which parts of the terrain are dangerous. Use the hub to standardize the design and the assessment, and bring your own discipline for the cost, the restraint, and the failure modes it cannot encode. That combination, the official frameworks plus the operational scar tissue, is what an architect is actually for.

DEV Community: Alex Pechenizkiy

Microsoft Fabric Apps Are a Distribution Channel, Not a Marketplace: Who Should Build One

What a Fabric app actually is (and is not)

The distribution math behind Microsoft Fabric apps

The OneLake trade: interoperability story, capacity-model reality

Who should build Microsoft Fabric custom workloads

The AI angle Microsoft is underselling

The 90-day evaluation framework

Decide on Microsoft Fabric apps this quarter

Foundry Hosted vs In-Process vs Copilot Studio Agents (2026 Decision)

The three paths in one paragraph each

What "in-process" actually means here

How Foundry Hosted agents actually run

The side-by-side comparison

How do you choose between Foundry Hosted, in-process, and Copilot Studio?

Identity and governance: where the paths quietly differ

Extensibility: tools and MCP across the three

Where it breaks: caveats and the honest limits

Which agent build path should you pick?

Read Next

Azure AI Foundry New vs Classic: 2026 Migration Map

What "New" Actually Means: A Resource Model, Not a Skin

Two Project Types, and the One Microsoft Tells You to Pick

The Portal Feature Map: New, Classic, or Both

The API and SDK Rename You Cannot Ignore

What Are the Azure AI Foundry Migration Deadlines in 2026?

What Transfers in a Hub-to-Foundry Migration (and What Does Not)

A Decision Framework You Can Defend in a Review

Where It Breaks: Caveats and Open Questions

Foundry New vs Classic: The 2026 Bottom Line

Read Next

Ship a Real Website with Claude Code, GitHub, and Cloudflare (Cheap and Reliable)

Why this is cheap and reliable

Prerequisites (one-time, about 15 minutes)

Give this to your AI

The steps

1. Scaffold and confirm it builds

2. Put it on GitHub (private)

3. Connect Cloudflare Pages (you click this part)

4. Add your custom domain (you click this part)

5. The reliability gate

Cost, honestly

Gotchas (the things that actually bite)

The mental model to keep

Read Next

Microsoft Agent vs Flow: What Foundry's June 2026 Release Really Decides for You

What actually shipped, and why Claude is not the headline

Microsoft agent vs flow: the honest decision rule

AI copilots vs custom Azure build: what Teams publishing changes

Toolboxes, Routines, and Memory: the governance story hiding in the features

Agent Optimizer: watch list, not roadmap

The decision gate to run before you build

From Zero to Autonomous: An Agentic Development Workflow with Claude Code and Azure ML

The blueprint: Claude as a first-class Azure ML compute step

Tests as the control plane

Governance that makes it shippable

Deployment with a human checkpoint that scales

Where this breaks and the honest limits

Rayfin Decoded: Microsoft's Bet on Prompt-to-Production Backends

What Rayfin actually does

Why this is aimed at the layer that did not compress

What it genuinely solves

What it couples you to

Deliver compresses, accountability does not

When to reach for it

Read next

Coding Agents and Developer Economics on the Microsoft Stack (2026)

The headcount data is the whole argument

What actually compresses, and what does not

The job changes shape, exactly as it always has

What should you hire a Microsoft AI architect for in 2026?

Where the analogy has edges

What this does not mean

Read Next

AB-100 Decoded (2026): What the Agentic AI Architect Exam Tests

What Is AB-100? The New Architect Exam and Its Prerequisite Gate

The AB-100 prerequisite gate: 14 qualifying certs, and retired PL-600 or MB-700 do not count

AB-100 Exam Weighting: Why Deploy Beats Design

What Microsoft Thinks an Agentic AI Architect Is

Copilot Studio is the center of gravity