Raj Murugan

Posted on Nov 30, 2025

Bedrock AgentCore: What 5 Real ANZ Enterprise Deploys Taught Us

#aws #bedrock #agentcore

I've spent the last 9 months shipping Bedrock AgentCore into four different ANZ enterprises (plus one internal PoC that crashed and burned).

This isn't a hello-world tutorial – it's the bruises, the invoices, and the 3 a.m. CloudWatch alarms that finally made the thing stick.

If you're about to promote an agent past the "demo for the board" stage, steal this checklist – it will save you at least one rollback.

the numbers we actually saw

Pattern	Use-case	10 k q/mo cost	p95 latency	Notes
Single agent	Simple Q&A	~AUD 180	2.1 s	Hallucinated once traffic > 2 k/day
Supervisor + 3 subs	HR triage	~AUD 420	4.3 s	60 % less duplicate Lambda code
AgentCore Runtime	SRE co-pilot	~AUD 620	3.8 s	GitOps deploy, full traces
Guardrail-wrapped	Student chat	~AUD 520	4.9 s	PII blocked, compliance happy

Supervisor pattern is the only one that survived a production spike without a hot-fix.

Single agents are great for a sprint demo – and terrible for anything that hits the internet.

Managed agents vs. AgentCore Runtime – pick one before 10 k users

I drew this on a whiteboard for our CFO after she saw the second invoice:

Rule we now write into every SoW:

PoC = managed. Day-1 prod = Runtime.

The moment you need a custom MCP tool or a side-car Lambda, the console becomes a drag.

Ground-truth data – skip it and you'll ship a liar

Our first Kindo chatbot went live with 37 manually-written examples.

Two weeks later a student asked "What grade do I need to pass?" and the agent calmly invented a 42 % cutoff (it's 50 %).

Cue 4 a.m. rollback.

We fixed it the boring way:

Exported 18 k real (de-identified) chat logs.
LLM-expanded edge cases: "give me 50 ways to ask about vacation pay".
Human reviewed, 1 200 kept.
Added sessionAttributes (studentID, semester) so the agent could look up live data.

Accuracy jumped from 67 % → 92 % and the support ticket queue dropped by half.

# pytest harness we run in CI
tests = json.load(open("ground_truth.json"))
for t in tests:
    out = agent.invoke(t["input"], sessionAttributes=t["attrs"])
    assert out["answer"] == t["expected"]

Supervisor pattern that actually compiles

Payroll bot rewrite: one supervisor + three specialised subs (policy, leave-balances, tickets).

60 % less copy-paste Lambda code, and we could unit-test each sub in isolation.

from agentcore import Agent, app

supervisor = Agent(
    model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
    instructions="You are a router. Never answer directly – always delegate to the correct sub-agent."
)

@app.entrypoint
def lambda_handler(event, _):
    return supervisor.invoke(event["prompt"])

Gateway MCP let us plug ServiceNow REST APIs without re-writing the OpenAPI schema – biggest time-saver of the sprint.

Guardrails – the checkbox that saved our audit

First deploy forgot guardrails.

Next day a student pasted their email + TFN into the chat and the agent happily parroted it back in the response.

Security team put a red sticker on my laptop.

Now we enforce org-level guardrails before any agent alias hits prod:

Filter	Block %	Mask %	AUD / mo
PII (email, TFN)	2.1	8.4	32
Custom finance terms	1.7	3.2	22
Hate/violence	0.3	–	12
Total	4.1	11.6	66

Cheap insurance.

IaC + observability – or you'll debug in the console at 2 a.m.

We template everything in CDK (Python). One cdk deploy spins up:

AgentCore Runtime container
Lambda layers for Powertools & boto3 latest
X-Ray traces, CloudWatch dashboards, alarms

Metric	Target	Alarm
Task success	≥ 95 %	< 90 %
p95 latency	≤ 5 s	> 10 s
Token spend	≤ AUD 70/day	> AUD 140
PII leak count	0	> 0

Routing loops show up as 30 s p99 spikes – impossible to spot without traces.

10-line deploy checklist we paste into every PR

[ ] 200+ ground-truth conversations in tests/ground_truth.json
[ ] Supervisor agent uses Sonnet; subs pinned to Haiku for cost
[ ] Guardrails alias attached (BLOCK PII, MASK custom)
[ ] agentcore deploy --stage prod --approve
[ ] Powertools tracer + metrics on every handler
[ ] CloudWatch alarm for "PII leak > 0" – page the on-call
[ ] A/B toggle for Haiku fallback if token spend > budget

Starter repo we fork every time:

https://github.com/awslabs/amazon-bedrock-agentcore-samples

80 % of the boilerplate is done in 90 min – the rest is ground-truth grunt work.

If you're riding the agent hype-wave right now, remember: the demo is the easy 10 %.

These notes are for the other 90 % – the invoices, the guardrails, the 3 a.m. pages.

Steal what you need, add your own scars, and ship something that won't hallucinate when the CFO asks it a question.

Happy building, and may your p95 always be under 5 s.

DEV Community