Part 1: Why I Chose Amazon Bedrock AgentCore (And What Lambda Gets Wrong for AI Agents)

#aws #bedrock #agentcore #aiagents

I built a production AI agent on AWS. Not a demo, not a proof of concept — a real system with persistent memory, guardrails, CI/CD pipelines, and users who depend on it not going down at 2am.

The thing nobody tells you: the hard part isn't the AI. The hard part is the infrastructure around it.

This series is my attempt to document everything I had to figure out the hard way — from architecture decisions in Part 1 all the way to cost optimisation in Part 6. The companion demo repo is at github.com/rajmurugan01/bedrock-agentcore-starter.

Let's start at the beginning: why Amazon Bedrock AgentCore, and why not the "obvious" serverless approach.

The obvious approach: Lambda + Bedrock

If you've shipped anything serverless on AWS, your first instinct is Lambda. You know it, it has great tooling, CDK support is mature, and it scales to zero.

For a simple Bedrock wrapper — get a message, call InvokeModel, return a response — Lambda is fine. But the moment you add conversational state, it starts to crack.

Here's what a real conversational AI agent needs:

Session state — the agent needs to remember what happened earlier in the conversation
Long-running processing — LLMs can take 30-90 seconds for complex multi-tool chains
Memory across sessions — the agent should know who the user is from previous conversations
Streaming responses — users expect tokens to appear progressively, not wait 60 seconds for a blob

Let's look at how Lambda handles each of these.

Problem 1: Lambda's 15-minute timeout

Lambda has a hard maximum execution timeout of 15 minutes. For a simple Q&A, that's fine. But for an agentic loop — where the model calls tools, processes results, calls more tools, and reasons over everything — you can easily hit 5-10 minutes per complex interaction.

And I haven't even mentioned the user's session. If a user comes back after 20 minutes and continues the conversation, that's a new Lambda invocation with zero context.

Problem 2: Session state storage

Lambda is stateless by design. Every invocation is independent. For conversational state, you need to:

Store session state somewhere (DynamoDB, ElastiCache, S3)
Load it at the start of every Lambda invocation
Save it at the end of every invocation
Handle the edge case where the Lambda times out mid-session
Build a session expiry and cleanup mechanism

That's a lot of undifferentiated infrastructure for a problem that isn't your core business.

Problem 3: Cross-session memory

Beyond session state, real assistants need memory — the ability to remember that a user's preferred contact method is email, that they're a premium customer, that they had a billing dispute last month.

With Lambda, you'd need to build this yourself: a vector database for semantic recall, a summarisation pipeline to consolidate old sessions, a retrieval step before each invocation. Entirely custom, entirely your problem to maintain.

What AgentCore actually does

Amazon Bedrock AgentCore is AWS's managed infrastructure for running AI agents. It's designed specifically for the workload pattern that Lambda handles poorly.

Here's the mental model: AgentCore is a managed container orchestrator for long-running, stateful AI agent sessions. You ship a Docker container with your agent code. AgentCore handles:

Container lifecycle — starts, stops, scales, and restarts containers
Session routing — routes each user session to the right container instance
Memory persistence — built-in Semantic, Summary, and UserPreference memory strategies
JWT validation — validates Cognito (or custom) JWTs before your code even runs
VPC networking — runs your containers inside your VPC without cold start penalties
SSE streaming — handles the HTTP connection and SSE protocol for you

The architectural difference:

Lambda approach:
  User message → API Gateway → Lambda (cold start?) → load session from DynamoDB →
  call Bedrock → save session to DynamoDB → return response → Lambda exits

AgentCore approach:
  User message → AgentCore Runtime (JWT validated) → your container (already warm) →
  call Bedrock → response streams back → container stays warm for next message

The architecture we're building

┌────────────────────────────────────────────────────────────────┐
│  GitHub Actions (OIDC)                                         │
│  ├── Build Docker (linux/amd64)                                │
│  ├── Push to ECR (:latest + :<sha>)                           │
│  └── update-agent-runtime CLI                                  │
└──────────────────────────────┬─────────────────────────────────┘
                               │
                    CDK v2 TypeScript deploys:
                               │
┌──────────────────────────────▼─────────────────────────────────┐
│  AWS Infrastructure (us-east-1)                                │
│                                                                │
│  AgentCore Runtime                                             │
│  ├── Cognito JWT authoriser                                    │
│  ├── AG-UI HTTP protocol (SSE streaming)                      │
│  └── Container: Python agent on port 8080                     │
│                                                                │
│  AgentCore Memory (3 strategies)                               │
│  Bedrock Guardrail (prompt injection + PII)                   │
│  CloudWatch alarms (token count + latency)                    │
└────────────────────────────────────────────────────────────────┘

Primary model: Claude Sonnet 4.6 with prompt caching
Background model: Amazon Nova Pro (cheap classification/summarisation)
CI/CD: GitHub Actions OIDC — no stored AWS credentials

Series roadmap

Part	Topic
Part 1 (this post)	Architecture & why AgentCore
Part 2	Full CDK stack + 9 deployment gotchas
Part 3	Python agent with Strands SDK + prompt caching
Part 4	Docker local dev loop
Part 5	GitHub Actions OIDC + ECR + Runtime updates
Part 6	Cost breakdown + alarms