I built a production AI agent on AWS. Not a demo, not a proof of concept — a real system with persistent memory, guardrails, CI/CD pipelines, and users who depend on it not going down at 2am.
The thing nobody tells you: the hard part isn't the AI. The hard part is the infrastructure around it.
This series is my attempt to document everything I had to figure out the hard way — from architecture decisions in Part 1 all the way to cost optimisation in Part 6. The companion demo repo is at github.com/rajmurugan01/bedrock-agentcore-starter.
Let's start at the beginning: why Amazon Bedrock AgentCore, and why not the "obvious" serverless approach.
The obvious approach: Lambda + Bedrock
If you've shipped anything serverless on AWS, your first instinct is Lambda. You know it, it has great tooling, CDK support is mature, and it scales to zero.
For a simple Bedrock wrapper — get a message, call InvokeModel, return a response — Lambda is fine. But the moment you add conversational state, it starts to crack.
Here's what a real conversational AI agent needs:
- Session state — the agent needs to remember what happened earlier in the conversation
- Long-running processing — LLMs can take 30-90 seconds for complex multi-tool chains
- Memory across sessions — the agent should know who the user is from previous conversations
- Streaming responses — users expect tokens to appear progressively, not wait 60 seconds for a blob
Let's look at how Lambda handles each of these.
Problem 1: Lambda's 15-minute timeout
Lambda has a hard maximum execution timeout of 15 minutes. For a simple Q&A, that's fine. But for an agentic loop — where the model calls tools, processes results, calls more tools, and reasons over everything — you can easily hit 5-10 minutes per complex interaction.
And I haven't even mentioned the user's session. If a user comes back after 20 minutes and continues the conversation, that's a new Lambda invocation with zero context.
Problem 2: Session state storage
Lambda is stateless by design. Every invocation is independent. For conversational state, you need to:
- Store session state somewhere (DynamoDB, ElastiCache, S3)
- Load it at the start of every Lambda invocation
- Save it at the end of every invocation
- Handle the edge case where the Lambda times out mid-session
- Build a session expiry and cleanup mechanism
That's a lot of undifferentiated infrastructure for a problem that isn't your core business.
Problem 3: Cross-session memory
Beyond session state, real assistants need memory — the ability to remember that a user's preferred contact method is email, that they're a premium customer, that they had a billing dispute last month.
With Lambda, you'd need to build this yourself: a vector database for semantic recall, a summarisation pipeline to consolidate old sessions, a retrieval step before each invocation. Entirely custom, entirely your problem to maintain.
What AgentCore actually does
Amazon Bedrock AgentCore is AWS's managed infrastructure for running AI agents. It's designed specifically for the workload pattern that Lambda handles poorly.
Here's the mental model: AgentCore is a managed container orchestrator for long-running, stateful AI agent sessions. You ship a Docker container with your agent code. AgentCore handles:
- Container lifecycle — starts, stops, scales, and restarts containers
- Session routing — routes each user session to the right container instance
- Memory persistence — built-in Semantic, Summary, and UserPreference memory strategies
- JWT validation — validates Cognito (or custom) JWTs before your code even runs
- VPC networking — runs your containers inside your VPC without cold start penalties
- SSE streaming — handles the HTTP connection and SSE protocol for you
The architectural difference:
Lambda approach:
User message → API Gateway → Lambda (cold start?) → load session from DynamoDB →
call Bedrock → save session to DynamoDB → return response → Lambda exits
AgentCore approach:
User message → AgentCore Runtime (JWT validated) → your container (already warm) →
call Bedrock → response streams back → container stays warm for next message
The architecture we're building
┌────────────────────────────────────────────────────────────────┐
│ GitHub Actions (OIDC) │
│ ├── Build Docker (linux/amd64) │
│ ├── Push to ECR (:latest + :<sha>) │
│ └── update-agent-runtime CLI │
└──────────────────────────────┬─────────────────────────────────┘
│
CDK v2 TypeScript deploys:
│
┌──────────────────────────────▼─────────────────────────────────┐
│ AWS Infrastructure (us-east-1) │
│ │
│ AgentCore Runtime │
│ ├── Cognito JWT authoriser │
│ ├── AG-UI HTTP protocol (SSE streaming) │
│ └── Container: Python agent on port 8080 │
│ │
│ AgentCore Memory (3 strategies) │
│ Bedrock Guardrail (prompt injection + PII) │
│ CloudWatch alarms (token count + latency) │
└────────────────────────────────────────────────────────────────┘
Primary model: Claude Sonnet 4.6 with prompt caching
Background model: Amazon Nova Pro (cheap classification/summarisation)
CI/CD: GitHub Actions OIDC — no stored AWS credentials
Series roadmap
| Part | Topic |
|---|---|
| Part 1 (this post) | Architecture & why AgentCore |
| Part 2 | Full CDK stack + 9 deployment gotchas |
| Part 3 | Python agent with Strands SDK + prompt caching |
| Part 4 | Docker local dev loop |
| Part 5 | GitHub Actions OIDC + ECR + Runtime updates |
| Part 6 | Cost breakdown + alarms |
Full demo repo: github.com/rajmurugan01/bedrock-agentcore-starter
Originally published at rajmurugan.com. This is Part 1 of the Ultimate Guide to Building AI Agents on AWS with Bedrock AgentCore series.
Top comments (0)