DEV Community

Raj Murugan
Raj Murugan

Posted on

Bedrock AgentCore: What 5 Real ANZ Enterprise Deploys Taught Us

I've spent the last 9 months shipping Bedrock AgentCore into four different ANZ enterprises (plus one internal PoC that crashed and burned).

This isn't a hello-world tutorial – it's the bruises, the invoices, and the 3 a.m. CloudWatch alarms that finally made the thing stick.

If you're about to promote an agent past the "demo for the board" stage, steal this checklist – it will save you at least one rollback.


the numbers we actually saw

Pattern Use-case 10 k q/mo cost p95 latency Notes
Single agent Simple Q&A ~AUD 180 2.1 s Hallucinated once traffic > 2 k/day
Supervisor + 3 subs HR triage ~AUD 420 4.3 s 60 % less duplicate Lambda code
AgentCore Runtime SRE co-pilot ~AUD 620 3.8 s GitOps deploy, full traces
Guardrail-wrapped Student chat ~AUD 520 4.9 s PII blocked, compliance happy

Supervisor pattern is the only one that survived a production spike without a hot-fix.

Single agents are great for a sprint demo – and terrible for anything that hits the internet.


Managed agents vs. AgentCore Runtime – pick one before 10 k users

I drew this on a whiteboard for our CFO after she saw the second invoice:

Rule we now write into every SoW:

PoC = managed. Day-1 prod = Runtime.

The moment you need a custom MCP tool or a side-car Lambda, the console becomes a drag.


Ground-truth data – skip it and you'll ship a liar

Our first Kindo chatbot went live with 37 manually-written examples.

Two weeks later a student asked "What grade do I need to pass?" and the agent calmly invented a 42 % cutoff (it's 50 %).

Cue 4 a.m. rollback.

We fixed it the boring way:

  1. Exported 18 k real (de-identified) chat logs.
  2. LLM-expanded edge cases: "give me 50 ways to ask about vacation pay".
  3. Human reviewed, 1 200 kept.
  4. Added sessionAttributes (studentID, semester) so the agent could look up live data.

Accuracy jumped from 67 % → 92 % and the support ticket queue dropped by half.

# pytest harness we run in CI
tests = json.load(open("ground_truth.json"))
for t in tests:
    out = agent.invoke(t["input"], sessionAttributes=t["attrs"])
    assert out["answer"] == t["expected"]
Enter fullscreen mode Exit fullscreen mode

Supervisor pattern that actually compiles

Payroll bot rewrite: one supervisor + three specialised subs (policy, leave-balances, tickets).

60 % less copy-paste Lambda code, and we could unit-test each sub in isolation.

from agentcore import Agent, app

supervisor = Agent(
    model_id="anthropic.claude-3-5-sonnet-20240620-v1:0",
    instructions="You are a router. Never answer directly – always delegate to the correct sub-agent."
)

@app.entrypoint
def lambda_handler(event, _):
    return supervisor.invoke(event["prompt"])
Enter fullscreen mode Exit fullscreen mode

Gateway MCP let us plug ServiceNow REST APIs without re-writing the OpenAPI schema – biggest time-saver of the sprint.


Guardrails – the checkbox that saved our audit

First deploy forgot guardrails.

Next day a student pasted their email + TFN into the chat and the agent happily parroted it back in the response.

Security team put a red sticker on my laptop.

Now we enforce org-level guardrails before any agent alias hits prod:

Filter Block % Mask % AUD / mo
PII (email, TFN) 2.1 8.4 32
Custom finance terms 1.7 3.2 22
Hate/violence 0.3 12
Total 4.1 11.6 66

Cheap insurance.


IaC + observability – or you'll debug in the console at 2 a.m.

We template everything in CDK (Python). One cdk deploy spins up:

  • AgentCore Runtime container
  • Lambda layers for Powertools & boto3 latest
  • X-Ray traces, CloudWatch dashboards, alarms
Metric Target Alarm
Task success ≥ 95 % < 90 %
p95 latency ≤ 5 s > 10 s
Token spend ≤ AUD 70/day > AUD 140
PII leak count 0 > 0

Routing loops show up as 30 s p99 spikes – impossible to spot without traces.


10-line deploy checklist we paste into every PR

  • [ ] 200+ ground-truth conversations in tests/ground_truth.json
  • [ ] Supervisor agent uses Sonnet; subs pinned to Haiku for cost
  • [ ] Guardrails alias attached (BLOCK PII, MASK custom)
  • [ ] agentcore deploy --stage prod --approve
  • [ ] Powertools tracer + metrics on every handler
  • [ ] CloudWatch alarm for "PII leak > 0" – page the on-call
  • [ ] A/B toggle for Haiku fallback if token spend > budget

Starter repo we fork every time:

https://github.com/awslabs/amazon-bedrock-agentcore-samples

80 % of the boilerplate is done in 90 min – the rest is ground-truth grunt work.


If you're riding the agent hype-wave right now, remember: the demo is the easy 10 %.

These notes are for the other 90 % – the invoices, the guardrails, the 3 a.m. pages.

Steal what you need, add your own scars, and ship something that won't hallucinate when the CFO asks it a question.

Happy building, and may your p95 always be under 5 s.

Top comments (0)