Alexey Vidanov for AWS Community Builders

Posted on May 6

Amazon Bedrock AgentCore Harness runs your agent. ShapeV2 controls what it's allowed to do

#agents #ai #aws #python

Missing reasoning audit layer

Amazon Web Services (AWS) just shipped Amazon Bedrock AgentCore harness harness in public preview. It solves the infrastructure problem every team building AI agents has been re-solving from scratch (compute, memory, tool connectivity, observability), and it solves it well. You declare a config; you get a running agent.

It does not solve governance. That's a separate layer, and it's the layer where most agent failures actually happen.

What AgentCore Harness is

Every AI agent runs an orchestration loop: call the model, pick a tool, pass results back, manage context, handle failures. That loop needs infrastructure under it: compute, sandboxing, secure tool connections, persistent storage, identity, observability. That stack is the "harness." Until AgentCore, every team built it from scratch.

AgentCore Harness replaces that build with a configuration. You declare what your agent does (model, tools, instructions), and AWS handles the rest.

Available in: US West (Oregon), US East (N. Virginia), Asia Pacific (Sydney), Europe (Frankfurt).
Pricing: No separate harness charge. You pay for the underlying AgentCore capabilities you use.
Powered by: Strands Agents, AWS's open-source agent framework.

What you get

Isolated compute. Every session in its own microVM, with its own filesystem and shell. Run shell commands directly on the session (no model reasoning, no token cost) for setup, scripts, or debugging.
Stateful by default. Persistent short-term and long-term memory across sessions. Persistent filesystem. Sessions resume where they left off.
Multi-model, mid-session. Any model from Amazon Bedrock, OpenAI, or Google Gemini. Switch providers mid-session without losing context.
Tool connectivity. Through Amazon Bedrock AgentCore Gateway, MCP servers, or the built-in browser and code interpreter.
Custom environments. Bring your own source, dependencies, and tools.
Observability. Every action traced through Amazon Bedrock AgentCore Observability.
Security. Amazon Virtual Private Cloud (Amazon VPC) networking, identity, per-session access controls.

This turns days of plumbing into a config change. Trying a different model or adding a tool stops being a refactor.

Full docs.

Where it stops

Your agent now has a secure environment, persistent memory, and a dozen tools. The infrastructure problem is solved. A different set of questions stays open:

Can the agent call send_email before it's finished reading customer data?
If a 3-step workflow fails at step 2, does step 1 get rolled back?
When the agent burns 90% of its budget, does its behavior change, or just the bill?
Can you prove why a specific tool call was permitted, not just that it happened?

AgentCore Harness traces what happened. It does not control what's allowed to happen. That's a layer boundary, and infrastructure and governance benefit from being decoupled.

Shape: governance for the tools your agent calls

The questions above don't get answered by adding more observability. They get answered by enforcing rules at the moment a tool is about to run.

Shape is a single-file Python library (~400 lines, zero dependencies) that adds that enforcement layer:

from shape import Agent, ToolEffect

agent = Agent("customer-service", budget=5.00)
agent.tool("lookup_customer", effect=ToolEffect.READ,         fn=lookup_fn)
agent.tool("update_record",   effect=ToolEffect.REVERSIBLE,   fn=update_fn)
agent.tool("send_email",      effect=ToolEffect.IRREVERSIBLE, fn=email_fn)

agent.rules("""
    BLOCK send_email WHEN phase IS NOT commit
    BLOCK * WHEN budget ABOVE 90%
""")

# EXPLORE: read-only, safe
with agent.explore() as ctx:
    customer = ctx.call("lookup_customer", id="C-1234")

# COMMIT: transactional, all-or-nothing
with agent.commit() as tx:
    tx.call("update_record", cost=0.01, id="C-1234", status="welcomed")
    tx.call("send_email",    cost=0.10, to=customer["email"], template="welcome")
    # if send_email fails → update_record is compensated automatically

What it enforces:

Phase lifecycle. Explore → Decide → Commit. In Explore, only read tools work. Call a write tool in Explore and you get an exception, not a warning. The agent reads before it writes, structurally, not by prompt discipline.
Transactional tool calls. Every step in a commit succeeds, or none stick. Automatic compensation on failure. Databases solved this in 1978; AI agents have not.
Budget as a control signal. Not a metric you check after the invoice. At configurable thresholds, behavior changes in real time: reduce scope, block commits, force re-evaluation, hard stop.
Proof traces. A structured record of why each tool call was permitted. Phase check passed. Budget check passed. Rule check passed. A decision chain, not a log line.
Human-readable rule DSL. Governance rules a non-engineer can read and audit.

How they fit together

┌─────────────────────────────────────┐
│  Agent logic (LLM + prompts)        │
├─────────────────────────────────────┤
│  Shape (governance)                 │  ← permission, phases, transactions
├─────────────────────────────────────┤
│  AgentCore Harness (infrastructure) │  ← compute, memory, networking
└─────────────────────────────────────┘

Deploy Shape inside an AgentCore Harness custom environment. The harness provides the runtime. Shape decides what the agent is allowed to do inside it.

Capability	AgentCore Harness	Shape
Managed compute and isolation	✓	✗
Persistent memory and filesystem	✓	✗
Multi-model switching	✓	✗
Observability (what happened)	✓	✗
Phase enforcement (read before write)	✗	✓
Transactional tool calls with rollback	✗	✓
Budget as a behavioral gate	✗	✓
Proof traces (why it was permitted)	✗	✓
Human-readable rule DSL	Cedar (via Gateway)	built-in
Vendor lock-in	AWS	none
Dependencies	AWS SDK	zero

This gap isn't AgentCore-specific

LangGraph, CrewAI, Strands: they all optimize for capability. None enforce permission at runtime. The failure modes repeat across real projects:

Agent writes to a database before finishing its read phase. Partial data corrupts downstream services.
A 3-step workflow fails at step 2. Step 1 already committed. Manual cleanup follows.
Cost spikes because nothing gates behavior at budget thresholds. You find out from the invoice.
An incident happens. You can trace what the agent did, not why the system allowed it.

Infrastructure answers "can my agent run?" Governance answers "should my agent act right now, with this tool, at this cost?" Different questions, different layers. AgentCore Harness solves the first one well. The second one is still on you, and it's the one that determines whether you trust the agent in production.

Top comments (6)

Mykola Kondratiuk • May 8

ran into this split almost immediately when evaluating it - harness gives you the observable running agent, ShapeV2 gives you input constraints, but neither answers what did this agent actually decide and why for post-hoc review. that audit layer stays on us.

Alexey Vidanov AWS Community Builders • May 9

Shape has audit

Mykola Kondratiuk • May 9

shape audit tells you what happened, not why. harness gives you observable running state; shape gives you the log. from a governance standpoint both matter — the log proves the decision happened, the running trace shows the context. treating them as either/or is where most teams get burned.

Alexey Vidanov AWS Community Builders • May 12

Agreed both matter, but worth being precise about which is which. Harness traces give you runtime context (what the agent did, in what order, with what latency). Shape proof traces give you the permission chain (why each call was allowed: phase, budget, rule match). One is execution telemetry, the other is decision provenance. Teams that conflate them end up with great dashboards and no defensible audit story.

Mykola Kondratiuk • May 12

yeah that's the cleaner framing - most teams conflate the two because both end up in the same observability dashboard and nobody separates the why from the what until governance actually fails

Harjot Singh • May 31

Harness runs the agent, the policy layer controls what it's allowed to do is exactly the right separation of concerns, and naming them as two distinct things is the insight. Execution and authorization are different problems: the harness is about getting the agent to run reliably (loop, tools, retries, state), while the control layer is about bounding what that run is permitted to touch, and conflating them is how you end up with a capable agent that's also unconstrained. Splitting them means the permission policy is declarative and auditable independent of the agent's logic, you can reason about what can this agent possibly do without reading its prompt, which is the only way to actually trust it. It's the same shape as separating application code from IAM: the app does the work, the policy says what the work is allowed to reach, and the policy holds even if the app is compromised or confused. The part that makes this powerful is that the control layer is enforced by the system, not requested of the model, so prompt injection can't talk its way past a permission it doesn't have. Run freely inside a boundary the agent can't widen. That separate-execution-from-authorization instinct is core to how I think about Moonshift. Does the control layer gate per-tool-call at runtime, or define the allowed capability set up front so out-of-policy actions aren't even reachable?