Microsoft Build 2026 had many AI announcements, but the one I found most meaningful was not another model or another Copilot UI.
It was Agent Harness in Microsoft Agent Framework.
My reading is simple: Microsoft is trying to move agent development from "the agent can run" to "the agent can survive production."
That difference matters.
Most agent demos look impressive because the happy path is short. A user asks, the model reasons, a tool is called, and the result comes back. But real agent workloads are messier:
- the task runs for many steps
- context becomes too large
- the agent needs memory
- some tools require approval
- developers need traces
- multiple agents may need to collaborate
- generated code may need to execute safely
These are not side features. They are the execution layer of an agent system.
Agent Harness is interesting because it makes that execution layer part of the framework default.
What Changed
At a high level, creating a harness agent looks like this:
agent = create_harness_agent(
client=client,
name="MyAgent",
max_context_window_tokens=128_000,
)
The API change is small. The assumption change is not.
The old assumption was: create an agent and let it call tools.
The new assumption is: this agent may run for a long time, use many tools, need human approval, overflow context, delegate work, and require observability.
Agent Harness includes built-in providers for things like:
- automatic context compaction
- file-based memory
- todo tracking
- plan vs. execute modes
- dynamic skill discovery
- background agents
- tool approval rules
- OpenTelemetry tracing
- web search and shell execution
This is why I do not see Agent Harness as just a feature list. I see it as a standardization move.
Every serious agent platform eventually needs these capabilities. The question is whether each team builds them manually, or whether the framework provides strong defaults.
CodeAct: The Real Problem Is Not Tool Speed
The first feature worth paying attention to is CodeAct with Hyperlight.
Traditional tool calling usually works like this:
LLM -> choose tool
Tool -> return result
LLM -> choose next tool
Tool -> return result
LLM -> choose next tool
...
This is fine for simple tasks. But it becomes expensive when the work is procedural.
Imagine an agent needs to calculate total order value across users:
- list users
- get orders for each user
- get discount for each order
- get tax rate by region
- calculate the final total
In traditional tool calling, the model may need many turns just to walk through a loop.
CodeAct changes this by letting the model generate code once:
users = call_tool("list_users")
total = 0
for user in users:
orders = call_tool("get_orders", user_id=user.id)
for order in orders:
discount = call_tool("get_discount", order_id=order.id)
tax = call_tool("get_tax_rate", region=order.region)
total += (order.amount - discount) * (1 + tax)
print(total)
The important improvement is not that the tools execute faster.
The improvement is that the system reduces model-to-tool round trips.
Microsoft's example showed more than 50% latency reduction and more than 60% token reduction for a multi-step workflow. The exact number will vary, but the pattern is very useful: when the task is procedural, let the model express the procedure as code and let the runtime execute it.
This is closer to how developers work. If I need to repeat an operation across many records, I do not want to manually reason through every iteration. I write a small script.
CodeAct gives agents that same execution pattern.
Hyperlight Does Not Remove Tool Governance
CodeAct uses Hyperlight to run generated code in an isolated micro-VM. That is important because model-generated code should not run directly in the host environment.
But I think the security boundary needs to be stated clearly:
CodeAct sandboxing protects the host from unsafe generated code. It does not automatically make your tools safe.
If your tool can send an email, delete a file, update a database, approve a refund, or trigger a deployment, the sandbox is not enough. You still need tool-level permissions, approval policies, and auditability.
In other words:
Sandbox protects code execution.
Approval protects business actions.
Confusing these two would be dangerous in production.
Handoff: Multi-Agent Workflow Should Not Always Be a Pipeline
The second feature I found important is Handoff.
Many multi-agent examples are built as a fixed pipeline:
Planner -> Implementer -> Reviewer
That works for some development workflows. But many real service workflows are not linear.
Think about customer support:
Coordinator
-> Refund Agent
-> Shipping Agent
-> Technical Support Agent
-> back to Coordinator if needed
The right next step depends on the conversation.
This is where Handoff is useful. Developers define the participants and topology, while agents can decide when to transfer control.
A simplified structure looks like this:
workflow = (
HandoffBuilder(name="customer_support")
.participants([coordinator, refund, shipping, tech])
.set_coordinator(coordinator)
.with_interaction_mode("autonomous")
.with_termination_condition(should_terminate)
.build()
)
The point is not simply "multiple agents."
The point is runtime routing.
A coordinator can route to a specialist. A specialist can finish the task, ask for more information, or hand control back. The workflow can end early when the condition is met.
That is very different from forcing every request through the same fixed sequence.
My Takeaway
For me, the most important message from Agent Harness is this:
Production agents need an execution layer, not just a reasoning model.
That execution layer includes context management, memory, approvals, tracing, code execution, and multi-agent routing.
CodeAct improves single-agent efficiency by reducing unnecessary model turns.
Handoff improves multi-agent collaboration by allowing dynamic runtime routing.
Agent Harness brings these ideas into the Microsoft Agent Framework as default infrastructure.
This is why I think Agent Harness matters. It is not the most visually exciting Build 2026 announcement, but it may be one of the most practical ones for developers building real agent systems.
The next phase of agent development will not be defined only by smarter models. It will also be defined by better execution infrastructure.
Top comments (0)