Suhas Mallesh

Posted on Apr 24

Agentic AWS - Day 2: Amazon Bedrock AgentCore Runtime

#aws #ai #agents #terraform

Series: Agentic AWS | Post: 2 of 6 | Cloud: AWS

Why Agents Need Their Own Runtime

A Lambda function times out in 15 minutes. An EC2 instance charges you whether the agent is thinking or idle. An ECS task requires container orchestration expertise before you write a single line of agent logic.

AI agents have fundamentally different runtime requirements - they run for minutes to hours, maintain session context across tool calls, need isolated execution per user, and must scale from zero to many concurrent sessions without pre-provisioning.

AgentCore Runtime is a serverless execution environment purpose-built for exactly this workload. It hosts your agent code in ARM64 containers with up to 8-hour execution windows, full session isolation, built-in observability, and native support for both HTTP and the A2A (Agent-to-Agent) protocol. You bring the agent logic; Runtime handles everything else.

In Post 1 we built an AgentCore Gateway that exposes an order-status Lambda as an MCP tool. This post deploys the agent itself - the process that calls that gateway, reasons with Claude, and serves user requests - onto AgentCore Runtime via Terraform and container deployment.

Architecture

Client (curl / SDK)
        |
        | HTTPS + JWT auth
        v
AgentCore Runtime endpoint
        |  (session-isolated container per user)
        v
Agent container (Python + Strands SDK)
        |
        | MCP streamable HTTP + SigV4
        v
AgentCore Gateway  (from Post 1)
        |
        v
Lambda: order-status-tool

Each user session gets its own isolated container instance. Session state - conversation history, in-flight tool calls - lives in that container for the duration of the session. When the session idles past the timeout, the container is reaped and you stop paying.

Agent Code

The agent runs as a long-lived HTTP server inside the container. AgentCore Runtime routes requests to it via the /invocations endpoint.

# agent/main.py
import json
import os
import asyncio
from http.server import HTTPServer, BaseHTTPRequestHandler
from strands import Agent
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

GATEWAY_ENDPOINT = os.environ["GATEWAY_ENDPOINT"]
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
MODEL_ID = os.environ.get("MODEL_ID", "anthropic.claude-3-5-sonnet-20241022-v2:0")

session_creds = boto3.session.Session().get_credentials().resolve()


def signed_headers(url: str) -> dict:
    """SigV4 signed headers for AgentCore Gateway inbound IAM auth."""
    request = AWSRequest(method="POST", url=url)
    SigV4Auth(session_creds, "bedrock", AWS_REGION).add_auth(request)
    return dict(request.headers)


async def build_agent() -> Agent:
    """
    Connect to AgentCore Gateway, load available tools,
    and return a Strands Agent ready to handle requests.
    """
    headers = signed_headers(GATEWAY_ENDPOINT)
    mcp_client = MCPClient(
        lambda: streamablehttp_client(GATEWAY_ENDPOINT, headers=headers)
    )
    tools = await mcp_client.get_tools()

    return Agent(
        model=MODEL_ID,
        tools=tools,
        system_prompt=(
            "You are a helpful order support agent. "
            "Use your tools to look up order status and shipping details. "
            "Always confirm the order ID before making tool calls."
        ),
    )


# Build agent once at container startup - reused across requests in the session
agent = asyncio.run(build_agent())


class AgentHandler(BaseHTTPRequestHandler):
    """
    AgentCore Runtime expects a POST /invocations endpoint.
    Request body: {"prompt": "user message", "session_id": "..."}
    Response body: {"response": "agent reply"}
    """

    def do_POST(self):
        if self.path != "/invocations":
            self.send_response(404)
            self.end_headers()
            return

        length = int(self.headers.get("Content-Length", 0))
        body = json.loads(self.rfile.read(length))
        prompt = body.get("prompt", "")

        try:
            result = asyncio.run(agent.invoke_async(prompt))
            response_body = json.dumps({"response": result.message})
            self.send_response(200)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(response_body.encode())
        except Exception as e:
            error_body = json.dumps({"error": str(e)})
            self.send_response(500)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(error_body.encode())

    def log_message(self, format, *args):
        # AgentCore Runtime captures stdout/stderr to CloudWatch
        print(f"[{self.address_string()}] {format % args}")


if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8080))
    print(f"AgentCore Runtime agent listening on port {port}")
    server = HTTPServer(("0.0.0.0", port), AgentHandler)
    server.serve_forever()

# agent/Dockerfile
FROM public.ecr.aws/amazonlinux/amazonlinux:2023-minimal

RUN dnf install -y python3.12 python3.12-pip && dnf clean all

WORKDIR /app

COPY requirements.txt .
RUN pip3.12 install --no-cache-dir -r requirements.txt

COPY main.py .

# AgentCore Runtime routes traffic to port 8080 by default
EXPOSE 8080

CMD ["python3.12", "main.py"]

# agent/requirements.txt
strands-agents>=0.1.0
mcp>=1.0.0
boto3>=1.35.0

Terraform Infrastructure

Variables

# variables.tf
variable "aws_region" {
  type = string
}

variable "environment" {
  type = string
}

variable "project_name" {
  type    = string
  default = "agentic-aws"
}

variable "gateway_endpoint" {
  description = "AgentCore Gateway MCP endpoint URL (output from Post 1 stack)"
  type        = string
}

variable "gateway_arn" {
  description = "AgentCore Gateway ARN for IAM policy (output from Post 1 stack)"
  type        = string
}

variable "idle_session_timeout_seconds" {
  description = "Seconds before an idle session container is reaped"
  type        = number
  default     = 1800
}

variable "max_session_lifetime_seconds" {
  description = "Hard ceiling on session duration (max 28800 = 8 hours)"
  type        = number
  default     = 7200
}

variable "container_cpu" {
  description = "vCPU units for the agent container (1024 = 1 vCPU)"
  type        = number
  default     = 1024
}

variable "container_memory_mb" {
  description = "Memory in MB for the agent container"
  type        = number
  default     = 2048
}

# dev.tfvars
aws_region                   = "us-east-1"
environment                  = "dev"
idle_session_timeout_seconds = 600    # 10 min - aggressive cleanup in dev
max_session_lifetime_seconds = 3600   # 1 hour ceiling in dev
container_cpu                = 512
container_memory_mb          = 1024

# prod.tfvars
aws_region                   = "us-east-1"
environment                  = "prod"
idle_session_timeout_seconds = 1800   # 30 min idle tolerance
max_session_lifetime_seconds = 28800  # Full 8-hour window
container_cpu                = 1024
container_memory_mb          = 2048

ECR Repository

# ecr.tf
resource "aws_ecr_repository" "agent" {
  name                 = "${var.project_name}-agent-${var.environment}"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }
}

resource "aws_ecr_lifecycle_policy" "agent" {
  repository = aws_ecr_repository.agent.name

  policy = jsonencode({
    rules = [{
      rulePriority = 1
      description  = "Keep last 10 images"
      selection = {
        tagStatus   = "any"
        countType   = "imageCountMoreThan"
        countNumber = 10
      }
      action = { type = "expire" }
    }]
  })
}

output "ecr_repository_url" {
  value = aws_ecr_repository.agent.repository_url
}

IAM for Runtime

# iam.tf

# Execution role - assumed by AgentCore Runtime service
resource "aws_iam_role" "runtime_execution" {
  name = "${var.project_name}-runtime-exec-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "bedrock.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "runtime_permissions" {
  name = "runtime-permissions"
  role = aws_iam_role.runtime_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid      = "BedrockModelAccess"
        Effect   = "Allow"
        Action   = "bedrock:InvokeModel"
        Resource = "arn:aws:bedrock:${var.aws_region}::foundation-model/*"
      },
      {
        Sid      = "AgentCoreGatewayAccess"
        Effect   = "Allow"
        Action   = "bedrock:InvokeAgentCoreGateway"
        Resource = var.gateway_arn
      },
      {
        Sid    = "CloudWatchLogs"
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:${var.aws_region}:*:log-group:/aws/bedrock-agentcore/*"
      },
      {
        Sid      = "ECRPull"
        Effect   = "Allow"
        Action   = [
          "ecr:GetDownloadUrlForLayer",
          "ecr:BatchGetImage",
          "ecr:GetAuthorizationToken"
        ]
        Resource = "*"
      }
    ]
  })
}

AgentCore Runtime Resource

# runtime.tf

resource "aws_bedrockagentcore_agent_runtime" "main" {
  name        = "${var.project_name}-runtime-${var.environment}"
  description = "Agentic AWS order support agent runtime"

  # Container image deployed to ECR
  runtime_artifact = {
    container_image = {
      uri = "${aws_ecr_repository.agent.repository_url}:latest"
    }
  }

  # IAM role the runtime assumes
  role_arn = aws_iam_role.runtime_execution.arn

  # Environment variables injected into every container instance
  environment_variables = {
    GATEWAY_ENDPOINT = var.gateway_endpoint
    AWS_REGION       = var.aws_region
    MODEL_ID         = "anthropic.claude-3-5-sonnet-20241022-v2:0"
  }

  # Session lifecycle controls
  session_idle_timeout_in_seconds = var.idle_session_timeout_seconds
  max_session_duration_in_seconds = var.max_session_lifetime_seconds

  # Protocol: HTTP for standard request/response, A2A for multi-agent
  server_protocol = "HTTP"

  # JWT authorizer - validates tokens before requests reach the container
  # Remove authorizer_configuration block for unauthenticated dev testing
  authorizer_configuration = {
    jwt = {
      discovery_url      = "https://cognito-idp.${var.aws_region}.amazonaws.com/${aws_cognito_user_pool.agents.id}/.well-known/openid-configuration"
      allowed_audience   = ["agentcore-runtime-${var.environment}"]
    }
  }

  # Resource limits per container instance
  compute_configuration = {
    cpu    = var.container_cpu
    memory = var.container_memory_mb
  }

  depends_on = [aws_iam_role_policy.runtime_permissions]
}

output "runtime_endpoint" {
  description = "HTTPS endpoint to invoke the agent"
  value       = aws_bedrockagentcore_agent_runtime.main.endpoint_url
}

output "runtime_arn" {
  value = aws_bedrockagentcore_agent_runtime.main.arn
}

Cognito for JWT Auth

# cognito.tf

resource "aws_cognito_user_pool" "agents" {
  name = "${var.project_name}-agents-${var.environment}"
}

resource "aws_cognito_user_pool_client" "runtime_client" {
  name         = "runtime-client-${var.environment}"
  user_pool_id = aws_cognito_user_pool.agents.id

  generate_secret                      = true
  allowed_oauth_flows                  = ["client_credentials"]
  allowed_oauth_flows_user_pool_client = true
  allowed_oauth_scopes                 = ["agentcore-runtime-${var.environment}/invoke"]

  explicit_auth_flows = ["ALLOW_USER_SRP_AUTH", "ALLOW_REFRESH_TOKEN_AUTH"]
}

resource "aws_cognito_user_pool_domain" "agents" {
  domain       = "${var.project_name}-agents-${var.environment}"
  user_pool_id = aws_cognito_user_pool.agents.id
}

output "cognito_token_url" {
  value = "https://${aws_cognito_user_pool_domain.agents.domain}.auth.${var.aws_region}.amazoncognito.com/oauth2/token"
}

output "cognito_client_id" {
  value = aws_cognito_user_pool_client.runtime_client.id
}

Build and Deploy

# 1. Terraform apply (provisions ECR + Runtime, outputs ECR URL)
terraform init
terraform apply -var-file=dev.tfvars

ECR_URL=$(terraform output -raw ecr_repository_url)
RUNTIME_ENDPOINT=$(terraform output -raw runtime_endpoint)

# 2. Build and push container image
# CRITICAL: target linux/arm64 - AgentCore Runtime is ARM64
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin $ECR_URL

docker build \
  --platform linux/arm64 \
  -t $ECR_URL:latest \
  ./agent

docker push $ECR_URL:latest

# 3. Get a JWT token from Cognito (client credentials flow)
TOKEN=$(aws cognito-idp initiate-auth \
  --auth-flow USER_SRP_AUTH \
  --client-id $(terraform output -raw cognito_client_id) \
  --query "AuthenticationResult.IdToken" \
  --output text)

# 4. Invoke the agent
curl -X POST $RUNTIME_ENDPOINT/invocations \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Where is order ORD-1234?", "session_id": "user-abc"}'

{
  "response": "Order ORD-1234 is currently in transit with FedEx. Tracking number 794601234567. Estimated delivery April 17, 2026."
}

How Session Isolation Works

When a request arrives with session_id: "user-abc", AgentCore Runtime routes it to the container instance bound to that session. If no instance exists yet, Runtime cold-starts one. Subsequent requests with the same session ID hit the same container - so the agent's in-memory conversation history persists across turns.

Two users with different session IDs get completely separate container instances. There is no shared memory, no shared state, no cross-contamination between sessions. This is the key property that makes AgentCore Runtime safe for multi-tenant production workloads without any application-level session management code.

When idle_session_timeout_seconds elapses with no requests, Runtime tears down the container. The next request for that session ID cold-starts a fresh instance. For stateful workflows that need memory to survive session teardown, Post 3 covers AgentCore Memory.

ARM64 Architecture - The Critical Gotcha

AgentCore Runtime runs on ARM64. Building your dependencies on an x86 machine produces silent import errors at runtime. Always build with --platform linux/arm64:

# Wrong - builds for your Mac or x86 CI runner
docker build -t my-agent .

# Correct - explicit ARM64 target
docker build --platform linux/arm64 -t my-agent .

If your CI pipeline runs on x86, add --platform linux/arm64 to every docker build command and ensure your base image has ARM64 variants available (the amazonlinux:2023-minimal image used above does).

Decision Framework

Scenario	Configuration	Notes
Dev / testing	No JWT authorizer, short idle timeout (10 min)	Saves cost, no token management overhead
Production	JWT authorizer (Cognito or OIDC), 30 min idle	Token validated before container is hit
Short workflows (< 30 min)	`max_session_lifetime_seconds = 1800`	Limit blast radius on runaway agents
Long-running research tasks	`max_session_lifetime_seconds = 28800`	Full 8-hour window
Multi-agent orchestration	`server_protocol = "A2A"`	Runtime acts as an A2A server; other agents can call it
VPC isolation required	Add `network_mode = "VPC"` + subnet/SG config	Traffic stays off public internet
Large ML deps (> 250MB)	Container deployment	ZIP limit is 250MB; containers support up to 1GB

Production Additions

VPC mode - Add network_mode = "VPC" with vpc_subnet_ids and vpc_security_group_ids to keep agent traffic inside your VPC. Combine with PrivateLink to reach AgentCore Gateway without public egress.

Observability - AgentCore Runtime emits token usage, session duration, latency, and error rates to CloudWatch automatically. No SDK instrumentation needed. For richer traces, add OpenTelemetry export to Datadog, LangFuse, or Langsmith from your agent code.

Secrets - Pass sensitive values (API keys, DB passwords) via AWS Secrets Manager, not environment variables. Environment variables are visible in the console. Fetch secrets at container startup with boto3.client("secretsmanager").

A2A protocol - Set server_protocol = "A2A" to expose the runtime as an Agent-to-Agent server. Other AgentCore Runtime agents can then call it as a sub-agent. Post 6 in this series builds a full multi-agent system on this capability.

What's Next

Post 3 covers AgentCore Memory - persistent context that survives session teardown. Without it, every new session starts from zero. Memory adds short-term (within session), long-term (across sessions), and episodic (experience-based learning) storage, all managed, with no vector database to provision.

The Runtime you built here connects to AgentCore Memory with a single configuration addition - no changes to agent code required.

Key Takeaways

AgentCore Runtime provides session-isolated, serverless containers for agent workloads - up to 8-hour execution windows with no pre-provisioned infrastructure
Always build container images targeting linux/arm64 - Runtime is ARM64 and silent import errors will bite you on x86 builds
Idle session timeout and max lifetime are the two most important cost controls - set them aggressively in dev
JWT authorization (Cognito or any OIDC provider) sits in front of the container - your agent code handles no auth logic
server_protocol = "A2A" turns the runtime into a callable sub-agent for multi-agent orchestration patterns

Series: Agentic AWS | Next: Post 3 - AgentCore Memory

DEV Community