Bruce Wong

Posted on Jun 11

Microsoft Build 2026: Agent Harness Is About Making Agents Production-Ready

#agents #ai #microsoft #news

Microsoft Build 2026 had many AI announcements, but the one I found most meaningful was not another model or another Copilot UI.

It was Agent Harness in Microsoft Agent Framework.

My reading is simple: Microsoft is trying to move agent development from "the agent can run" to "the agent can survive production."

That difference matters.

Most agent demos look impressive because the happy path is short. A user asks, the model reasons, a tool is called, and the result comes back. But real agent workloads are messier:

the task runs for many steps
context becomes too large
the agent needs memory
some tools require approval
developers need traces
multiple agents may need to collaborate
generated code may need to execute safely

These are not side features. They are the execution layer of an agent system.

Agent Harness is interesting because it makes that execution layer part of the framework default.

What Changed

At a high level, creating a harness agent looks like this:

agent = create_harness_agent(
    client=client,
    name="MyAgent",
    max_context_window_tokens=128_000,
)

The API change is small. The assumption change is not.

The old assumption was: create an agent and let it call tools.

The new assumption is: this agent may run for a long time, use many tools, need human approval, overflow context, delegate work, and require observability.

Agent Harness includes built-in providers for things like:

automatic context compaction
file-based memory
todo tracking
plan vs. execute modes
dynamic skill discovery
background agents
tool approval rules
OpenTelemetry tracing
web search and shell execution

This is why I do not see Agent Harness as just a feature list. I see it as a standardization move.

Every serious agent platform eventually needs these capabilities. The question is whether each team builds them manually, or whether the framework provides strong defaults.

CodeAct: The Real Problem Is Not Tool Speed

The first feature worth paying attention to is CodeAct with Hyperlight.

Traditional tool calling usually works like this:

LLM -> choose tool
Tool -> return result
LLM -> choose next tool
Tool -> return result
LLM -> choose next tool
...

This is fine for simple tasks. But it becomes expensive when the work is procedural.

Imagine an agent needs to calculate total order value across users:

list users
get orders for each user
get discount for each order
get tax rate by region
calculate the final total

In traditional tool calling, the model may need many turns just to walk through a loop.

CodeAct changes this by letting the model generate code once:

users = call_tool("list_users")
total = 0

for user in users:
    orders = call_tool("get_orders", user_id=user.id)
    for order in orders:
        discount = call_tool("get_discount", order_id=order.id)
        tax = call_tool("get_tax_rate", region=order.region)
        total += (order.amount - discount) * (1 + tax)

print(total)

The important improvement is not that the tools execute faster.

The improvement is that the system reduces model-to-tool round trips.

Microsoft's example showed more than 50% latency reduction and more than 60% token reduction for a multi-step workflow. The exact number will vary, but the pattern is very useful: when the task is procedural, let the model express the procedure as code and let the runtime execute it.

This is closer to how developers work. If I need to repeat an operation across many records, I do not want to manually reason through every iteration. I write a small script.

CodeAct gives agents that same execution pattern.

Hyperlight Does Not Remove Tool Governance

CodeAct uses Hyperlight to run generated code in an isolated micro-VM. That is important because model-generated code should not run directly in the host environment.

But I think the security boundary needs to be stated clearly:

CodeAct sandboxing protects the host from unsafe generated code. It does not automatically make your tools safe.

If your tool can send an email, delete a file, update a database, approve a refund, or trigger a deployment, the sandbox is not enough. You still need tool-level permissions, approval policies, and auditability.

In other words:

Sandbox protects code execution.
Approval protects business actions.

Confusing these two would be dangerous in production.

Handoff: Multi-Agent Workflow Should Not Always Be a Pipeline

The second feature I found important is Handoff.

Many multi-agent examples are built as a fixed pipeline:

Planner -> Implementer -> Reviewer

That works for some development workflows. But many real service workflows are not linear.

Think about customer support:

Coordinator
  -> Refund Agent
  -> Shipping Agent
  -> Technical Support Agent
  -> back to Coordinator if needed

The right next step depends on the conversation.

This is where Handoff is useful. Developers define the participants and topology, while agents can decide when to transfer control.

A simplified structure looks like this:

workflow = (
    HandoffBuilder(name="customer_support")
    .participants([coordinator, refund, shipping, tech])
    .set_coordinator(coordinator)
    .with_interaction_mode("autonomous")
    .with_termination_condition(should_terminate)
    .build()
)

The point is not simply "multiple agents."

The point is runtime routing.

A coordinator can route to a specialist. A specialist can finish the task, ask for more information, or hand control back. The workflow can end early when the condition is met.

That is very different from forcing every request through the same fixed sequence.

My Takeaway

For me, the most important message from Agent Harness is this:

Production agents need an execution layer, not just a reasoning model.

That execution layer includes context management, memory, approvals, tracing, code execution, and multi-agent routing.

CodeAct improves single-agent efficiency by reducing unnecessary model turns.

Handoff improves multi-agent collaboration by allowing dynamic runtime routing.

Agent Harness brings these ideas into the Microsoft Agent Framework as default infrastructure.

This is why I think Agent Harness matters. It is not the most visually exciting Build 2026 announcement, but it may be one of the most practical ones for developers building real agent systems.

The next phase of agent development will not be defined only by smarter models. It will also be defined by better execution infrastructure.

Top comments (1)

Mehmet Can Farsak • Jun 13

Spot on about the shift from "the agent can run" to "the agent can survive production." One gap I've noticed in these harness frameworks: they handle execution governance but not mode governance. Agents still jump from brainstorming to coding without a "thinking vs doing" boundary. Built Brainstorm-Mode (mehmetcanfarsak on GitHub) that plugs into the hook layer to enforce ideation-first workflows. Three modes (divergent, actionable, academic) give agents a proper mode switch instead of one monolithic run loop.