AWS just shipped AgentCore Harness in public preview. It solves the infrastructure problem every team building AI agents has been re-solving from scratch (compute, memory, tool connectivity, observability), and it solves it well. You declare a config; you get a running agent.
It does not solve governance. That's a separate layer, and it's the layer where most agent failures actually happen.
What AgentCore Harness is
Every AI agent runs an orchestration loop: call the model, pick a tool, pass results back, manage context, handle failures. That loop needs infrastructure under it: compute, sandboxing, secure tool connections, persistent storage, identity, observability. That stack is the "harness." Until AgentCore, every team built it from scratch.
AgentCore Harness replaces that build with a configuration. You declare what your agent does (model, tools, instructions), and AWS handles the rest.
Available in: US West (Oregon), US East (N. Virginia), Asia Pacific (Sydney), Europe (Frankfurt).
Pricing: No separate harness charge. You pay for the underlying AgentCore capabilities you use.
Powered by: Strands Agents, AWS's open-source agent framework.
What you get
- Isolated compute. Every session in its own microVM, with its own filesystem and shell. Run shell commands directly on the session (no model reasoning, no token cost) for setup, scripts, or debugging.
- Stateful by default. Persistent short-term and long-term memory across sessions. Persistent filesystem. Sessions resume where they left off.
- Multi-model, mid-session. Any model from Amazon Bedrock, OpenAI, or Google Gemini. Switch providers mid-session without losing context.
- Tool connectivity. Through AgentCore Gateway, MCP servers, or the built-in browser and code interpreter.
- Custom environments. Bring your own source, dependencies, and tools.
- Observability. Every action traced through AgentCore Observability.
- Security. VPC networking, identity, per-session access controls.
This turns days of plumbing into a config change. Trying a different model or adding a tool stops being a refactor.
Where it stops
Your agent now has a secure environment, persistent memory, and a dozen tools. The infrastructure problem is solved. A different set of questions stays open:
- Can the agent call
send_emailbefore it's finished reading customer data? - If a 3-step workflow fails at step 2, does step 1 get rolled back?
- When the agent burns 90% of its budget, does its behavior change, or just the bill?
- Can you prove why a specific tool call was permitted, not just that it happened?
AgentCore Harness traces what happened. It does not control what's allowed to happen. That's a layer boundary, and infrastructure and governance benefit from being decoupled.
Shape: governance for the tools your agent calls
The questions above don't get answered by adding more observability. They get answered by enforcing rules at the moment a tool is about to run.
Shape is a single-file Python library (~400 lines, zero dependencies) that adds that enforcement layer:
from shape import Agent, ToolEffect
agent = Agent("customer-service", budget=5.00)
agent.tool("lookup_customer", effect=ToolEffect.READ, fn=lookup_fn)
agent.tool("update_record", effect=ToolEffect.REVERSIBLE, fn=update_fn)
agent.tool("send_email", effect=ToolEffect.IRREVERSIBLE, fn=email_fn)
agent.rules("""
BLOCK send_email WHEN phase IS NOT commit
BLOCK * WHEN budget ABOVE 90%
""")
# EXPLORE: read-only, safe
with agent.explore() as ctx:
customer = ctx.call("lookup_customer", id="C-1234")
# COMMIT: transactional, all-or-nothing
with agent.commit() as tx:
tx.call("update_record", cost=0.01, id="C-1234", status="welcomed")
tx.call("send_email", cost=0.10, to=customer["email"], template="welcome")
# if send_email fails → update_record is compensated automatically
What it enforces:
- Phase lifecycle. Explore → Decide → Commit. In Explore, only read tools work. Call a write tool in Explore and you get an exception, not a warning. The agent reads before it writes, structurally, not by prompt discipline.
- Transactional tool calls. Every step in a commit succeeds, or none stick. Automatic compensation on failure. Databases solved this in 1978; AI agents have not.
- Budget as a control signal. Not a metric you check after the invoice. At configurable thresholds, behavior changes in real time: reduce scope, block commits, force re-evaluation, hard stop.
- Proof traces. A structured record of why each tool call was permitted. Phase check passed. Budget check passed. Rule check passed. A decision chain, not a log line.
- Human-readable rule DSL. Governance rules a non-engineer can read and audit.
How they fit together
┌─────────────────────────────────────┐
│ Agent logic (LLM + prompts) │
├─────────────────────────────────────┤
│ Shape (governance) │ ← permission, phases, transactions
├─────────────────────────────────────┤
│ AgentCore Harness (infrastructure) │ ← compute, memory, networking
└─────────────────────────────────────┘
Deploy Shape inside an AgentCore Harness custom environment. The harness provides the runtime. Shape decides what the agent is allowed to do inside it.
| Capability | AgentCore Harness | Shape |
|---|---|---|
| Managed compute and isolation | ✓ | ✗ |
| Persistent memory and filesystem | ✓ | ✗ |
| Multi-model switching | ✓ | ✗ |
| Observability (what happened) | ✓ | ✗ |
| Phase enforcement (read before write) | ✗ | ✓ |
| Transactional tool calls with rollback | ✗ | ✓ |
| Budget as a behavioral gate | ✗ | ✓ |
| Proof traces (why it was permitted) | ✗ | ✓ |
| Human-readable rule DSL | Cedar (via Gateway) | built-in |
| Vendor lock-in | AWS | none |
| Dependencies | AWS SDK | zero |
This gap isn't AgentCore-specific
LangGraph, CrewAI, Strands: they all optimize for capability. None enforce permission at runtime. The failure modes repeat across real projects:
- Agent writes to a database before finishing its read phase. Partial data corrupts downstream services.
- A 3-step workflow fails at step 2. Step 1 already committed. Manual cleanup follows.
- Cost spikes because nothing gates behavior at budget thresholds. You find out from the invoice.
- An incident happens. You can trace what the agent did, not why the system allowed it.
Infrastructure answers "can my agent run?" Governance answers "should my agent act right now, with this tool, at this cost?" Different questions, different layers. AgentCore Harness solves the first one well. The second one is still on you, and it's the one that determines whether you trust the agent in production.

Top comments (0)