Kento IKEDA for AWS Community Builders

Posted on May 8

What AgentCore Managed Harness Takes Over, What It Leaves to You

#aws #agents #ai #bedrock

On April 22, 2026, AWS added a "managed agent harness" (preview) to Amazon Bedrock AgentCore. With this feature, you declare the model, system prompt, and tools as configuration, and the agent runs—the orchestration code lives on the AWS side as managed.

https://aws.amazon.com/blogs/machine-learning/get-to-your-first-working-agent-in-minutes-announcing-new-features-in-amazon-bedrock-agentcore/

What stands out about this release is less the feature itself and more AWS's adoption of the term "agent harness." Since Martin Fowler wrote his harness engineering essay in February 2026, Anthropic and OpenAI have started using "harness" officially, and now a cloud vendor has applied the same word to its own service.

https://martinfowler.com/articles/harness-engineering.html

From the perspective of someone who has been assembling a harness by hand, the question becomes: what does managed harness take over, and what stays in my hands? This article sorts out that dividing line. Drawing on experience running business-automation agents with Claude Desktop, multiple MCP servers, and Markdown-based knowledge, I lay out the correspondence with AgentCore managed harness.

A few "tried it out" articles have already been published, so this article positions itself as the prequel: it offers material for deciding whether to adopt, not adopt, or how to phase in. Drawing on the official blog, documentation, and existing explanatory articles as sources, I sort out the correspondence and the judgment criteria that emerge from self-built operation.

AWS released "managed harness"

The official blog mentioned above lays out the structure: every agent has an orchestration layer, and running that layer requires compute, a sandbox to safely execute code, tool connections, persistent storage, and error recovery as the underlying infrastructure—bundled together, they form the agent harness. Managed harness is AWS providing this harness as a managed offering, where the user declares the model, system prompt, and tools as configuration, and a working agent is the result.

Let me first align on what the word "harness" refers to. The term gets used both for what the vendor builds in (internal) and for what the user assembles around the agent (external), and the meaning shifts with context. In addition to Fowler's framing, watany has organized the internal/external confusion in a Zenn article.

https://zenn.dev/watany/articles/d8b692bbca65a3

This article is written from the position of "someone who has been assembling the external environment by hand"—the user-side harness, in operation. AgentCore managed harness can be read as the vendor-side internal harness now offered as managed, but from the user's perspective, it can also be read as: part of what we used to build for ourselves can now be delegated. This duality is the starting point for thinking about where responsibilities split with self-built operation.

Self-built harness composition, the four blank layers

Let me map my self-built harness to AgentCore's components. The environment I've been operating consists, broadly, of three elements, and I'll lay out how each one corresponds to something on the AgentCore side.

Self-built harness	AgentCore side	Degree of correspondence
Markdown knowledge files (under `agents/`, `knowledge/`)	AgentCore Memory	Similar role; persistence and retrieval mechanisms differ
MCP servers (task management / calendar / chat / document management, etc.)	AgentCore Gateway	MCP is becoming the standard, so they're close
Claude Desktop	AgentCore Runtime	The execution base for the agent loop, at a different scale
(none)	AgentCore Identity	Not implemented in self-built
(none)	AgentCore Policy	Not implemented in self-built
(none)	AgentCore Observability	Not implemented in self-built
(none)	AgentCore Evaluations	Not implemented in self-built

The top three are the correspondence between "what I assembled by hand" and "what AgentCore provides as managed for the same role." The bottom four are blank layers in the self-built harness—components AgentCore offers that aren't covered by my operation.

The natural question here is whether these four blank layers are "things I didn't write because I didn't need them" or "things I wanted but had given up on." The two are different. For the former, introducing managed harness yields little value; for the latter, it brings value.

Let me go through the four layers in order.

Identity is for managing authentication and permissions when multiple users access the agent. Since my self-built harness runs on a personal device, authentication can rely on the device login, and per-agent authentication wasn't necessary. This is unnecessary "as long as it's just me." The moment you try to share an agent across an organization, controlling who can call which MCP for what becomes a problem, and the gap surfaces in the form of resignation.

Policy is the mechanism for declaratively defining boundaries when the agent calls tools. It's based on Cedar, AWS's open-source policy language, and you can generate policies from natural language. In my self-built harness, I draw loose boundaries through MCP server scopes and by documenting "what not to do" in the knowledge files—but this is discipline, not enforcement. I had wanted to write strong, enforceable boundaries, but didn't have the motivation to build a Cedar-equivalent system myself, so I had given up on this area.

Observability is the mechanism for emitting agent execution logs, traces, and metrics to CloudWatch for visualization. In my self-built harness, I have the conversation history in Claude Desktop and individual logs from each MCP server, but no mechanism to track "which agent called what when, and how it failed" across the board. For solo use, looking at the chat screen suffices, but this becomes necessary in organizational deployment, and falls into the resignation category.

Evaluations is the mechanism for continuously evaluating the agent's response quality, with built-in evaluators for dimensions like helpfulness, tool-selection accuracy, and correctness. In my self-built harness, I check subjectively through knowledge-file improvement history and daily work logs, but I have no quantitative quality monitoring. For solo use, subjective is enough; but for organizational operation or paid services, this becomes essential.

Looking back at the four layers, only Identity falls into "unnecessary as long as it's just me," while the other three fall into "would have been nice, but had given up on as self-built." The fact that the meaning of "blank" differs by layer affects the judgment of whether to adopt managed harness.

Layers managed harness takes over, layers it leaves

When you use managed harness, what stops being something you write, and what continues to require writing? This can be derived as fact from the official blog and documentation, so let me sort it out first.

What managed harness takes over is the following range:

The agent loop: calling the model, selecting tools, returning results, managing context, and recovering from errors
A microVM, filesystem, and shell isolated per session
Tool-connection orchestration via AgentCore Gateway
The framework portion based on Strands Agents

Conversely, what users still need to write even when using managed harness is the following range:

Which model to use
What to write in the system prompt
Which tools to make callable
What goes into AgentCore Memory and what doesn't
What boundaries to declare in AgentCore Policy

Since declaration-based configuration suffices, the amount of code drops significantly. However, the five items above are simply "what you write as configuration changes"—the judgments themselves don't go away. They just shift into the form of the harness.json configuration file. Reading preview validation articles by people who have actually tried managed harness, you'll see that harness.json lists the model and tool list as declarations, while a separate system-prompt.md file holds the system prompt.

https://dev.classmethod.jp/articles/bedrock-agentcore-managed-harness-preview/

https://github.com/aws-samples/sample-AgentCore-Managed-Harness-News

This looks like what was previously written as Markdown system-prompt files and MCP connection definitions in the self-built harness, repackaged into AWS's configuration file format.

In other words, what managed harness takes over is "the labor of writing orchestration code," not "the judgment of designing the agent." Design judgments still rest with the user. AWS expresses this as removing the infrastructure barrier, but the non-infrastructure part—"what is this agent for, and how far should it be allowed to go"—remains on the human side, whether it's managed or self-built.

This distinction is an important perspective when judging whether to adopt managed harness. The pitch "you don't have to write code" is accurate, but reading it as "you don't have to think" makes it inaccurate.

Where self-built operation can articulate "the place of design judgments"

When you operate a self-built harness, you accumulate judgments about "where it's okay to move things, and where you must not." These don't go away when you adopt managed harness. The place where they appear shifts to the contents of harness.json, but the judgments themselves continue to rest on the human side. Let me name a few representative ones.

Knowledge file granularity. Whether to split your Markdown knowledge "by role" or "by task" is a judgment that, once made, eases subsequent operation. Splitting by role lets agent dispatch fall naturally out of context. Splitting by task scatters cross-task knowledge. There's no simple winner; the optimum depends on the number of agents you operate and how tasks overlap. Even with managed harness, the same question—what to combine in Memory and what to separate—remains.

MCP server combination design. This is the line between "how far to wire up as tools via MCP" and "how far to handle through local file operations." For example, task management is better suited to MCP via API for automation, while sensitive tasks are safer kept as local file operations—judgments that emerge through use. Managed harness's Gateway has to answer the same question, just translated into declarations in a tool list.

Agent-to-agent responsibility split. This is the design choice between having a coordinator agent that judges context and dispatches to specialist agents, or calling specialist agents directly from the start. The coordinator style depends on context-judgment accuracy; the direct-call style puts the discrimination burden on the user. This too remains as a design judgment in managed harness, in the form of how to arrange and connect multiple harnesses.

These three are judgments that are hard to articulate without operating self-built first. If you start from managed harness, these judgments end up looking "as if they were optimally placed from the beginning." In reality, you've just fixed the premises, but inside fixed premises, the existence of design judgments themselves becomes harder to see.

Why not just use managed harness from the start?

Here's a counterargument I anticipate: "If we just use managed harness from the start, we won't need to build anything ourselves."

I partially agree with this counterargument. If you're building a new agent for organizational production from zero, going in through managed harness is faster, I think. However, the design of an agent to run in production rarely "is visible from the start." Only by actually using the agent do the granularity of knowledge, the over- and under-supply of tools, and the boundaries of responsibility come into view. Whether you run this discovery flow on top of a managed harness with set boundaries, or on a self-built harness with high freedom, changes the amount of learning you get.

Another perspective: judgments gained from self-built operation can be reused as a blueprint when you migrate to managed harness. If you go into managed harness without a blueprint, you can produce something that appears to work, but a system remains where it's hard to explain why it was structured that way. Whether "let's just put it on managed harness and improve it as we go" works depends on whether one person is improving or multiple people are improving. For one person, the iteration speed gap between self-built and managed may be small; but at the stage where multiple people improve, the declarative changes in harness.json and the deploy-unit iteration cycle start to take a toll as operational debt.

Order of adoption: where personal and organizational use diverge

Whether to adopt managed harness can naturally branch by operational scale. Let me go through three stages.

In the personal-use stage, where one person is using the agent, the self-built harness is often sufficient. The editing and use of knowledge files are tightly coupled, and the iteration of "rewrite Markdown the moment you notice something while using it" runs fast. Both Identity and Observability are hard to recognize as gaps as long as you're operating solo, and end up in the "would-be-nice-to-have, maybe" zone. In the experimental stage, this freedom directly translates into learning speed.

At the stage of expanding to organizational operation where multiple people use the agent, the four blank layers all surface as problems at once. You need audit logs of who used which agent how (Observability); you start running into situations where shared environments must not allow tools to be called freely, so boundaries become necessary (Policy); you need to manage credentials per member (Identity); you want to continuously measure agent response quality (Evaluations). At this stage, the value of managed harness comes to the fore. Comparing the labor of writing the four layers yourself versus putting them on AgentCore, the latter becomes practical.

In the transition phase, you can take a hybrid strategy. Continue the personal exploration stage with a self-built harness, and put only the confirmed paths used in organizational operation onto managed harness. Move agents whose design has settled to AgentCore in order, and keep agents that are still being learned on while running close at hand.

There's also a guideline for the order of adoption. The first things needed for organizational deployment are Identity and Observability, then Policy, and finally Evaluations. Without Identity, sharing itself doesn't get established. Without Observability, the organization can't make operational judgments. Policy is often too late after an incident, so placing it early in organizational deployment is safer. Evaluations can come in the order of "after operation gets going, then introduce quality measurement"—that's fine.

The harness was originally a concept lying at the boundary between those who build agents and those who use them. With AWS releasing managed harness, part of what we used to assemble by hand has shifted into a mechanism that runs simply by declaring it as configuration. The fact that layers like Identity, Observability, and Policy—which I had given up on as self-built—have come within reach is no small thing.

Even so, design judgments such as "what is this agent for," "what to leave in the knowledge," and "how far to grant tools authority" haven't been put into a form you can declare as configuration. The basis of these judgments will continue to live in the commit history and work logs of one's own repository. The experience of having built a self-built harness leaves behind, in your hands, knowledge that doesn't lose its value when you migrate to managed. With the arrival of managed harness, the boundary between "the layers we build ourselves" and "the layers only human judgment can carry" has become more clearly visible than before, you might say.