DEV Community

Cover image for AgentCore Harness runs your agent. ShapeV2 controls what it's allowed to do
Alexey Vidanov
Alexey Vidanov

Posted on

AgentCore Harness runs your agent. ShapeV2 controls what it's allowed to do

AWS just shipped AgentCore Harness in public preview. It solves the infrastructure problem every team building AI agents has been re-solving from scratch (compute, memory, tool connectivity, observability), and it solves it well. You declare a config; you get a running agent.

It does not solve governance. That's a separate layer, and it's the layer where most agent failures actually happen.

What AgentCore Harness is

Every AI agent runs an orchestration loop: call the model, pick a tool, pass results back, manage context, handle failures. That loop needs infrastructure under it: compute, sandboxing, secure tool connections, persistent storage, identity, observability. That stack is the "harness." Until AgentCore, every team built it from scratch.

AgentCore Harness replaces that build with a configuration. You declare what your agent does (model, tools, instructions), and AWS handles the rest.

Available in: US West (Oregon), US East (N. Virginia), Asia Pacific (Sydney), Europe (Frankfurt).
Pricing: No separate harness charge. You pay for the underlying AgentCore capabilities you use.
Powered by: Strands Agents, AWS's open-source agent framework.

What you get

  • Isolated compute. Every session in its own microVM, with its own filesystem and shell. Run shell commands directly on the session (no model reasoning, no token cost) for setup, scripts, or debugging.
  • Stateful by default. Persistent short-term and long-term memory across sessions. Persistent filesystem. Sessions resume where they left off.
  • Multi-model, mid-session. Any model from Amazon Bedrock, OpenAI, or Google Gemini. Switch providers mid-session without losing context.
  • Tool connectivity. Through AgentCore Gateway, MCP servers, or the built-in browser and code interpreter.
  • Custom environments. Bring your own source, dependencies, and tools.
  • Observability. Every action traced through AgentCore Observability.
  • Security. VPC networking, identity, per-session access controls.

This turns days of plumbing into a config change. Trying a different model or adding a tool stops being a refactor.

Full docs.

Where it stops

Your agent now has a secure environment, persistent memory, and a dozen tools. The infrastructure problem is solved. A different set of questions stays open:

  • Can the agent call send_email before it's finished reading customer data?
  • If a 3-step workflow fails at step 2, does step 1 get rolled back?
  • When the agent burns 90% of its budget, does its behavior change, or just the bill?
  • Can you prove why a specific tool call was permitted, not just that it happened?

AgentCore Harness traces what happened. It does not control what's allowed to happen. That's a layer boundary, and infrastructure and governance benefit from being decoupled.

Shape: governance for the tools your agent calls

The questions above don't get answered by adding more observability. They get answered by enforcing rules at the moment a tool is about to run.

Shape is a single-file Python library (~400 lines, zero dependencies) that adds that enforcement layer:

from shape import Agent, ToolEffect

agent = Agent("customer-service", budget=5.00)
agent.tool("lookup_customer", effect=ToolEffect.READ,         fn=lookup_fn)
agent.tool("update_record",   effect=ToolEffect.REVERSIBLE,   fn=update_fn)
agent.tool("send_email",      effect=ToolEffect.IRREVERSIBLE, fn=email_fn)

agent.rules("""
    BLOCK send_email WHEN phase IS NOT commit
    BLOCK * WHEN budget ABOVE 90%
""")

# EXPLORE: read-only, safe
with agent.explore() as ctx:
    customer = ctx.call("lookup_customer", id="C-1234")

# COMMIT: transactional, all-or-nothing
with agent.commit() as tx:
    tx.call("update_record", cost=0.01, id="C-1234", status="welcomed")
    tx.call("send_email",    cost=0.10, to=customer["email"], template="welcome")
    # if send_email fails → update_record is compensated automatically
Enter fullscreen mode Exit fullscreen mode

What it enforces:

  • Phase lifecycle. Explore → Decide → Commit. In Explore, only read tools work. Call a write tool in Explore and you get an exception, not a warning. The agent reads before it writes, structurally, not by prompt discipline.
  • Transactional tool calls. Every step in a commit succeeds, or none stick. Automatic compensation on failure. Databases solved this in 1978; AI agents have not.
  • Budget as a control signal. Not a metric you check after the invoice. At configurable thresholds, behavior changes in real time: reduce scope, block commits, force re-evaluation, hard stop.
  • Proof traces. A structured record of why each tool call was permitted. Phase check passed. Budget check passed. Rule check passed. A decision chain, not a log line.
  • Human-readable rule DSL. Governance rules a non-engineer can read and audit.

How they fit together

┌─────────────────────────────────────┐
│  Agent logic (LLM + prompts)        │
├─────────────────────────────────────┤
│  Shape (governance)                 │  ← permission, phases, transactions
├─────────────────────────────────────┤
│  AgentCore Harness (infrastructure) │  ← compute, memory, networking
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Deploy Shape inside an AgentCore Harness custom environment. The harness provides the runtime. Shape decides what the agent is allowed to do inside it.

Capability AgentCore Harness Shape
Managed compute and isolation
Persistent memory and filesystem
Multi-model switching
Observability (what happened)
Phase enforcement (read before write)
Transactional tool calls with rollback
Budget as a behavioral gate
Proof traces (why it was permitted)
Human-readable rule DSL Cedar (via Gateway) built-in
Vendor lock-in AWS none
Dependencies AWS SDK zero

This gap isn't AgentCore-specific

LangGraph, CrewAI, Strands: they all optimize for capability. None enforce permission at runtime. The failure modes repeat across real projects:

  • Agent writes to a database before finishing its read phase. Partial data corrupts downstream services.
  • A 3-step workflow fails at step 2. Step 1 already committed. Manual cleanup follows.
  • Cost spikes because nothing gates behavior at budget thresholds. You find out from the invoice.
  • An incident happens. You can trace what the agent did, not why the system allowed it.

Infrastructure answers "can my agent run?" Governance answers "should my agent act right now, with this tool, at this cost?" Different questions, different layers. AgentCore Harness solves the first one well. The second one is still on you, and it's the one that determines whether you trust the agent in production.

Links

Top comments (0)