Series: Agentic AWS | Post: 2 of 6 | Cloud: AWS
Why Agents Need Their Own Runtime
A Lambda function times out in 15 minutes. An EC2 instance charges you whether the agent is thinking or idle. An ECS task requires container orchestration expertise before you write a single line of agent logic.
AI agents have fundamentally different runtime requirements - they run for minutes to hours, maintain session context across tool calls, need isolated execution per user, and must scale from zero to many concurrent sessions without pre-provisioning.
AgentCore Runtime is a serverless execution environment purpose-built for exactly this workload. It hosts your agent code in ARM64 containers with up to 8-hour execution windows, full session isolation, built-in observability, and native support for both HTTP and the A2A (Agent-to-Agent) protocol. You bring the agent logic; Runtime handles everything else.
In Post 1 we built an AgentCore Gateway that exposes an order-status Lambda as an MCP tool. This post deploys the agent itself - the process that calls that gateway, reasons with Claude, and serves user requests - onto AgentCore Runtime via Terraform and container deployment.
Architecture
Client (curl / SDK)
|
| HTTPS + JWT auth
v
AgentCore Runtime endpoint
| (session-isolated container per user)
v
Agent container (Python + Strands SDK)
|
| MCP streamable HTTP + SigV4
v
AgentCore Gateway (from Post 1)
|
v
Lambda: order-status-tool
Each user session gets its own isolated container instance. Session state - conversation history, in-flight tool calls - lives in that container for the duration of the session. When the session idles past the timeout, the container is reaped and you stop paying.
Agent Code
The agent runs as a long-lived HTTP server inside the container. AgentCore Runtime routes requests to it via the /invocations endpoint.
# agent/main.py
import json
import os
import asyncio
from http.server import HTTPServer, BaseHTTPRequestHandler
from strands import Agent
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
GATEWAY_ENDPOINT = os.environ["GATEWAY_ENDPOINT"]
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
MODEL_ID = os.environ.get("MODEL_ID", "anthropic.claude-3-5-sonnet-20241022-v2:0")
session_creds = boto3.session.Session().get_credentials().resolve()
def signed_headers(url: str) -> dict:
"""SigV4 signed headers for AgentCore Gateway inbound IAM auth."""
request = AWSRequest(method="POST", url=url)
SigV4Auth(session_creds, "bedrock", AWS_REGION).add_auth(request)
return dict(request.headers)
async def build_agent() -> Agent:
"""
Connect to AgentCore Gateway, load available tools,
and return a Strands Agent ready to handle requests.
"""
headers = signed_headers(GATEWAY_ENDPOINT)
mcp_client = MCPClient(
lambda: streamablehttp_client(GATEWAY_ENDPOINT, headers=headers)
)
tools = await mcp_client.get_tools()
return Agent(
model=MODEL_ID,
tools=tools,
system_prompt=(
"You are a helpful order support agent. "
"Use your tools to look up order status and shipping details. "
"Always confirm the order ID before making tool calls."
),
)
# Build agent once at container startup - reused across requests in the session
agent = asyncio.run(build_agent())
class AgentHandler(BaseHTTPRequestHandler):
"""
AgentCore Runtime expects a POST /invocations endpoint.
Request body: {"prompt": "user message", "session_id": "..."}
Response body: {"response": "agent reply"}
"""
def do_POST(self):
if self.path != "/invocations":
self.send_response(404)
self.end_headers()
return
length = int(self.headers.get("Content-Length", 0))
body = json.loads(self.rfile.read(length))
prompt = body.get("prompt", "")
try:
result = asyncio.run(agent.invoke_async(prompt))
response_body = json.dumps({"response": result.message})
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(response_body.encode())
except Exception as e:
error_body = json.dumps({"error": str(e)})
self.send_response(500)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(error_body.encode())
def log_message(self, format, *args):
# AgentCore Runtime captures stdout/stderr to CloudWatch
print(f"[{self.address_string()}] {format % args}")
if __name__ == "__main__":
port = int(os.environ.get("PORT", 8080))
print(f"AgentCore Runtime agent listening on port {port}")
server = HTTPServer(("0.0.0.0", port), AgentHandler)
server.serve_forever()
# agent/Dockerfile
FROM public.ecr.aws/amazonlinux/amazonlinux:2023-minimal
RUN dnf install -y python3.12 python3.12-pip && dnf clean all
WORKDIR /app
COPY requirements.txt .
RUN pip3.12 install --no-cache-dir -r requirements.txt
COPY main.py .
# AgentCore Runtime routes traffic to port 8080 by default
EXPOSE 8080
CMD ["python3.12", "main.py"]
# agent/requirements.txt
strands-agents>=0.1.0
mcp>=1.0.0
boto3>=1.35.0
Terraform Infrastructure
Variables
# variables.tf
variable "aws_region" {
type = string
}
variable "environment" {
type = string
}
variable "project_name" {
type = string
default = "agentic-aws"
}
variable "gateway_endpoint" {
description = "AgentCore Gateway MCP endpoint URL (output from Post 1 stack)"
type = string
}
variable "gateway_arn" {
description = "AgentCore Gateway ARN for IAM policy (output from Post 1 stack)"
type = string
}
variable "idle_session_timeout_seconds" {
description = "Seconds before an idle session container is reaped"
type = number
default = 1800
}
variable "max_session_lifetime_seconds" {
description = "Hard ceiling on session duration (max 28800 = 8 hours)"
type = number
default = 7200
}
variable "container_cpu" {
description = "vCPU units for the agent container (1024 = 1 vCPU)"
type = number
default = 1024
}
variable "container_memory_mb" {
description = "Memory in MB for the agent container"
type = number
default = 2048
}
# dev.tfvars
aws_region = "us-east-1"
environment = "dev"
idle_session_timeout_seconds = 600 # 10 min - aggressive cleanup in dev
max_session_lifetime_seconds = 3600 # 1 hour ceiling in dev
container_cpu = 512
container_memory_mb = 1024
# prod.tfvars
aws_region = "us-east-1"
environment = "prod"
idle_session_timeout_seconds = 1800 # 30 min idle tolerance
max_session_lifetime_seconds = 28800 # Full 8-hour window
container_cpu = 1024
container_memory_mb = 2048
ECR Repository
# ecr.tf
resource "aws_ecr_repository" "agent" {
name = "${var.project_name}-agent-${var.environment}"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
encryption_configuration {
encryption_type = "AES256"
}
}
resource "aws_ecr_lifecycle_policy" "agent" {
repository = aws_ecr_repository.agent.name
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 10 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 10
}
action = { type = "expire" }
}]
})
}
output "ecr_repository_url" {
value = aws_ecr_repository.agent.repository_url
}
IAM for Runtime
# iam.tf
# Execution role - assumed by AgentCore Runtime service
resource "aws_iam_role" "runtime_execution" {
name = "${var.project_name}-runtime-exec-${var.environment}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { Service = "bedrock.amazonaws.com" }
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy" "runtime_permissions" {
name = "runtime-permissions"
role = aws_iam_role.runtime_execution.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "BedrockModelAccess"
Effect = "Allow"
Action = "bedrock:InvokeModel"
Resource = "arn:aws:bedrock:${var.aws_region}::foundation-model/*"
},
{
Sid = "AgentCoreGatewayAccess"
Effect = "Allow"
Action = "bedrock:InvokeAgentCoreGateway"
Resource = var.gateway_arn
},
{
Sid = "CloudWatchLogs"
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:${var.aws_region}:*:log-group:/aws/bedrock-agentcore/*"
},
{
Sid = "ECRPull"
Effect = "Allow"
Action = [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:GetAuthorizationToken"
]
Resource = "*"
}
]
})
}
AgentCore Runtime Resource
# runtime.tf
resource "aws_bedrockagentcore_agent_runtime" "main" {
name = "${var.project_name}-runtime-${var.environment}"
description = "Agentic AWS order support agent runtime"
# Container image deployed to ECR
runtime_artifact = {
container_image = {
uri = "${aws_ecr_repository.agent.repository_url}:latest"
}
}
# IAM role the runtime assumes
role_arn = aws_iam_role.runtime_execution.arn
# Environment variables injected into every container instance
environment_variables = {
GATEWAY_ENDPOINT = var.gateway_endpoint
AWS_REGION = var.aws_region
MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
}
# Session lifecycle controls
session_idle_timeout_in_seconds = var.idle_session_timeout_seconds
max_session_duration_in_seconds = var.max_session_lifetime_seconds
# Protocol: HTTP for standard request/response, A2A for multi-agent
server_protocol = "HTTP"
# JWT authorizer - validates tokens before requests reach the container
# Remove authorizer_configuration block for unauthenticated dev testing
authorizer_configuration = {
jwt = {
discovery_url = "https://cognito-idp.${var.aws_region}.amazonaws.com/${aws_cognito_user_pool.agents.id}/.well-known/openid-configuration"
allowed_audience = ["agentcore-runtime-${var.environment}"]
}
}
# Resource limits per container instance
compute_configuration = {
cpu = var.container_cpu
memory = var.container_memory_mb
}
depends_on = [aws_iam_role_policy.runtime_permissions]
}
output "runtime_endpoint" {
description = "HTTPS endpoint to invoke the agent"
value = aws_bedrockagentcore_agent_runtime.main.endpoint_url
}
output "runtime_arn" {
value = aws_bedrockagentcore_agent_runtime.main.arn
}
Cognito for JWT Auth
# cognito.tf
resource "aws_cognito_user_pool" "agents" {
name = "${var.project_name}-agents-${var.environment}"
}
resource "aws_cognito_user_pool_client" "runtime_client" {
name = "runtime-client-${var.environment}"
user_pool_id = aws_cognito_user_pool.agents.id
generate_secret = true
allowed_oauth_flows = ["client_credentials"]
allowed_oauth_flows_user_pool_client = true
allowed_oauth_scopes = ["agentcore-runtime-${var.environment}/invoke"]
explicit_auth_flows = ["ALLOW_USER_SRP_AUTH", "ALLOW_REFRESH_TOKEN_AUTH"]
}
resource "aws_cognito_user_pool_domain" "agents" {
domain = "${var.project_name}-agents-${var.environment}"
user_pool_id = aws_cognito_user_pool.agents.id
}
output "cognito_token_url" {
value = "https://${aws_cognito_user_pool_domain.agents.domain}.auth.${var.aws_region}.amazoncognito.com/oauth2/token"
}
output "cognito_client_id" {
value = aws_cognito_user_pool_client.runtime_client.id
}
Build and Deploy
# 1. Terraform apply (provisions ECR + Runtime, outputs ECR URL)
terraform init
terraform apply -var-file=dev.tfvars
ECR_URL=$(terraform output -raw ecr_repository_url)
RUNTIME_ENDPOINT=$(terraform output -raw runtime_endpoint)
# 2. Build and push container image
# CRITICAL: target linux/arm64 - AgentCore Runtime is ARM64
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin $ECR_URL
docker build \
--platform linux/arm64 \
-t $ECR_URL:latest \
./agent
docker push $ECR_URL:latest
# 3. Get a JWT token from Cognito (client credentials flow)
TOKEN=$(aws cognito-idp initiate-auth \
--auth-flow USER_SRP_AUTH \
--client-id $(terraform output -raw cognito_client_id) \
--query "AuthenticationResult.IdToken" \
--output text)
# 4. Invoke the agent
curl -X POST $RUNTIME_ENDPOINT/invocations \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt": "Where is order ORD-1234?", "session_id": "user-abc"}'
{
"response": "Order ORD-1234 is currently in transit with FedEx. Tracking number 794601234567. Estimated delivery April 17, 2026."
}
How Session Isolation Works
When a request arrives with session_id: "user-abc", AgentCore Runtime routes it to the container instance bound to that session. If no instance exists yet, Runtime cold-starts one. Subsequent requests with the same session ID hit the same container - so the agent's in-memory conversation history persists across turns.
Two users with different session IDs get completely separate container instances. There is no shared memory, no shared state, no cross-contamination between sessions. This is the key property that makes AgentCore Runtime safe for multi-tenant production workloads without any application-level session management code.
When idle_session_timeout_seconds elapses with no requests, Runtime tears down the container. The next request for that session ID cold-starts a fresh instance. For stateful workflows that need memory to survive session teardown, Post 3 covers AgentCore Memory.
ARM64 Architecture - The Critical Gotcha
AgentCore Runtime runs on ARM64. Building your dependencies on an x86 machine produces silent import errors at runtime. Always build with --platform linux/arm64:
# Wrong - builds for your Mac or x86 CI runner
docker build -t my-agent .
# Correct - explicit ARM64 target
docker build --platform linux/arm64 -t my-agent .
If your CI pipeline runs on x86, add --platform linux/arm64 to every docker build command and ensure your base image has ARM64 variants available (the amazonlinux:2023-minimal image used above does).
Decision Framework
| Scenario | Configuration | Notes |
|---|---|---|
| Dev / testing | No JWT authorizer, short idle timeout (10 min) | Saves cost, no token management overhead |
| Production | JWT authorizer (Cognito or OIDC), 30 min idle | Token validated before container is hit |
| Short workflows (< 30 min) | max_session_lifetime_seconds = 1800 |
Limit blast radius on runaway agents |
| Long-running research tasks | max_session_lifetime_seconds = 28800 |
Full 8-hour window |
| Multi-agent orchestration | server_protocol = "A2A" |
Runtime acts as an A2A server; other agents can call it |
| VPC isolation required | Add network_mode = "VPC" + subnet/SG config |
Traffic stays off public internet |
| Large ML deps (> 250MB) | Container deployment | ZIP limit is 250MB; containers support up to 1GB |
Production Additions
VPC mode - Add network_mode = "VPC" with vpc_subnet_ids and vpc_security_group_ids to keep agent traffic inside your VPC. Combine with PrivateLink to reach AgentCore Gateway without public egress.
Observability - AgentCore Runtime emits token usage, session duration, latency, and error rates to CloudWatch automatically. No SDK instrumentation needed. For richer traces, add OpenTelemetry export to Datadog, LangFuse, or Langsmith from your agent code.
Secrets - Pass sensitive values (API keys, DB passwords) via AWS Secrets Manager, not environment variables. Environment variables are visible in the console. Fetch secrets at container startup with boto3.client("secretsmanager").
A2A protocol - Set server_protocol = "A2A" to expose the runtime as an Agent-to-Agent server. Other AgentCore Runtime agents can then call it as a sub-agent. Post 6 in this series builds a full multi-agent system on this capability.
What's Next
Post 3 covers AgentCore Memory - persistent context that survives session teardown. Without it, every new session starts from zero. Memory adds short-term (within session), long-term (across sessions), and episodic (experience-based learning) storage, all managed, with no vector database to provision.
The Runtime you built here connects to AgentCore Memory with a single configuration addition - no changes to agent code required.
Key Takeaways
- AgentCore Runtime provides session-isolated, serverless containers for agent workloads - up to 8-hour execution windows with no pre-provisioned infrastructure
- Always build container images targeting
linux/arm64- Runtime is ARM64 and silent import errors will bite you on x86 builds - Idle session timeout and max lifetime are the two most important cost controls - set them aggressively in dev
- JWT authorization (Cognito or any OIDC provider) sits in front of the container - your agent code handles no auth logic
-
server_protocol = "A2A"turns the runtime into a callable sub-agent for multi-agent orchestration patterns
Series: Agentic AWS | Next: Post 3 - AgentCore Memory
Top comments (0)