<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AWS Heroes</title>
    <description>The latest articles on DEV Community by AWS Heroes (@aws-heroes).</description>
    <link>https://dev.to/aws-heroes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2491%2Ff0c1a659-c959-42cd-bb12-cd25909dd9db.png</url>
      <title>DEV Community: AWS Heroes</title>
      <link>https://dev.to/aws-heroes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aws-heroes"/>
    <language>en</language>
    <item>
      <title>Lambda Managed Instances with Terraform: Multi-Concurrency, High Memory, and Compute Options</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 29 May 2026 23:45:10 +0000</pubDate>
      <link>https://dev.to/aws-heroes/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options-3a5g</link>
      <guid>https://dev.to/aws-heroes/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options-3a5g</guid>
      <description>&lt;p&gt;Lambda has always been one request at a time per execution environment. Your function starts, processes a single invocation, and sits idle until the next one arrives. If you need to handle a thousand concurrent requests, Lambda spins up a thousand execution environments - each with its own memory, its own cold start, and its own per-GB-second bill.&lt;/p&gt;

&lt;p&gt;Lambda Managed Instances changes that model. Announced at re:Invent 2025 and expanded with &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-32-gb-memory-16-vcpus/" rel="noopener noreferrer"&gt;32 GB memory / 16 vCPU support&lt;/a&gt; in March 2026, LMI runs your functions on EC2 instances in your VPC with AWS handling provisioning, patching, scaling, and load balancing. Each execution environment handles multiple concurrent requests. You keep the Lambda programming model and gain EC2 hardware selection and pricing.&lt;/p&gt;

&lt;p&gt;I built a product similarity engine to explore how this works in practice. The handler loads a product catalog with Nova embeddings via Bedrock into memory, uses Amazon Nova Multimodal Embeddings to embed incoming search queries, and computes cosine similarity across categories in parallel using ThreadPoolExecutor. It's the kind of workload that doesn't fit well on standard Lambda - sustained throughput, memory-intensive, with a mix of I/O (Bedrock API calls) and CPU (vector math) that benefits from multi-concurrency and configurable memory-to-vCPU ratios. The project uses Terraform for infrastructure, Python 3.14 with Powertools for observability, and the embedding model is configurable (Nova by default, Titan Text Embeddings V2 as an alternative).&lt;/p&gt;

&lt;p&gt;The source code is on GitHub: &lt;a href="https://github.com/RDarrylR/lambda-managed-instances-similarity-engine" rel="noopener noreferrer"&gt;lambda-managed-instances-similarity-engine&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The AWS Compute Continuum
&lt;/h2&gt;

&lt;p&gt;Before diving into the implementation, it helps to understand where Lambda Managed Instances fits in the AWS compute landscape. The options form a continuum from fully managed to fully self-managed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdp6gzc3m6orf7pdjyni3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdp6gzc3m6orf7pdjyni3.png" alt="AWS Compute Continuum" width="799" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Standard Lambda&lt;/th&gt;
&lt;th&gt;Lambda Managed Instances&lt;/th&gt;
&lt;th&gt;ECS Express Mode&lt;/th&gt;
&lt;th&gt;ECS Fargate&lt;/th&gt;
&lt;th&gt;EKS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-invocation, instant&lt;/td&gt;
&lt;td&gt;Async, CPU-based and concurrency saturation&lt;/td&gt;
&lt;td&gt;Traffic-based, auto&lt;/td&gt;
&lt;td&gt;Task-based, minutes&lt;/td&gt;
&lt;td&gt;Pod-based, minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 per environment&lt;/td&gt;
&lt;td&gt;Multiple per environment&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-request + GB-second&lt;/td&gt;
&lt;td&gt;Per-request + EC2 + 15% mgmt fee&lt;/td&gt;
&lt;td&gt;Fargate + ALB&lt;/td&gt;
&lt;td&gt;Per-vCPU-hour&lt;/td&gt;
&lt;td&gt;EC2/Fargate + control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commitment discounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Savings Plans, Reserved Instances&lt;/td&gt;
&lt;td&gt;Fargate Savings Plans&lt;/td&gt;
&lt;td&gt;Fargate Savings Plans&lt;/td&gt;
&lt;td&gt;EC2 Savings, RIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cold start&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milliseconds-seconds&lt;/td&gt;
&lt;td&gt;Tens of seconds (new instances)&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max invocation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;td&gt;15 minutes (environments long-lived, instances rotated by Lambda)&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VPC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to 10 GB&lt;/td&gt;
&lt;td&gt;Up to 32 GB (configurable vCPU ratio)&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ops burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to choose Lambda Managed Instances:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sustained, predictable throughput (hundreds or thousands of requests per second)&lt;/li&gt;
&lt;li&gt;Workloads that benefit from specific EC2 instance types (Graviton4, high-bandwidth networking)&lt;/li&gt;
&lt;li&gt;Memory-intensive functions that exceed standard Lambda's 10 GB limit or need configurable memory-to-vCPU ratios&lt;/li&gt;
&lt;li&gt;Cost optimization at scale (10M+ invocations/month where EC2 pricing with Savings Plans beats per-GB-second)&lt;/li&gt;
&lt;li&gt;Functions that load large datasets into memory and reuse them across requests (embeddings, models, reference data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When standard Lambda is still better:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bursty, unpredictable traffic patterns&lt;/li&gt;
&lt;li&gt;Low to moderate throughput (standard Lambda's per-invocation pricing wins)&lt;/li&gt;
&lt;li&gt;Functions that need instant scaling (LMI scales asynchronously based on CPU utilization and execution-environment saturation; if traffic more than doubles within 5 minutes you may see throttles while capacity catches up)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've written about several of these compute options in previous projects. My &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS deep dive&lt;/a&gt; covers Fargate and ECS Express Mode. The &lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; demonstrates Step Functions with both Lambda and Fargate. My &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools best practices&lt;/a&gt; article covers the observability patterns used in this project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F234nfj3l6b9wlyoo76oa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F234nfj3l6b9wlyoo76oa.png" alt="Lambda Managed Instances Architecture" width="800" height="875"&gt;&lt;/a&gt;&lt;br&gt;
The architecture has three layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity Provider&lt;/strong&gt; - The foundation. Defines the VPC configuration, instance requirements (architecture, instance types), and scaling policies. &lt;strong&gt;Capacity providers define both the security boundary and the failure blast radius of your workload.&lt;/strong&gt; All functions assigned to the same capacity provider share EC2 instances and must be mutually trusted. This uses container-based isolation, not Firecracker. A compromised function on a shared capacity provider can affect every other function on the same instances. Separate untrusted workloads, regulated workloads, and production from non-production into distinct capacity providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Instances&lt;/strong&gt; - EC2 instances launched and managed by Lambda in your VPC. They're visible in the EC2 console (tagged as managed by Lambda) but you don't SSH into them, patch them, or configure autoscaling groups - Lambda handles all of that. The lifecycle includes a 14-day rotation for security compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution Environments&lt;/strong&gt; - Containers running your function code on the managed instances. Each environment handles multiple concurrent requests. For Python, each concurrency slot is a separate process with its own memory space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networking&lt;/strong&gt; - VPC connectivity is mandatory. Without proper outbound connectivity, functions execute but logs and traces are silently lost. This project uses private subnets with a NAT Gateway for telemetry transmission and Bedrock API access. For production, consider VPC endpoints to keep traffic on the AWS network.&lt;/p&gt;


&lt;h2&gt;
  
  
  Two-Level Concurrency
&lt;/h2&gt;

&lt;p&gt;This is what makes Lambda Managed Instances architecturally different from standard Lambda. There are two levels of parallelism:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 - LMI manages for you:&lt;/strong&gt; Multiple processes handle concurrent requests. Python's LMI runtime spawns a separate process for each concurrency slot (default: 16 per vCPU). Each process has its own memory space, its own global variables, and its own boto3 clients. No shared mutable state between processes. Scaling decisions are based on both execution environment saturation and CPU utilization, not request count alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2 - You manage yourself:&lt;/strong&gt; Within each request, you can use &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; to parallelize I/O operations. If your handler needs to search 5 product categories, you can search them in parallel rather than sequentially.&lt;/p&gt;

&lt;p&gt;Combined, this means a single execution environment with 1 vCPU and 10 concurrent processes, each running 4 search threads, can have 40 category searches in flight concurrently. On standard Lambda, you'd need 10 separate execution environments to handle those 10 concurrent requests, each paying per-GB-second for its own copy of the catalog in memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuegxeocl5i8a6h5lk453.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuegxeocl5i8a6h5lk453.png" alt="Two-Level Concurrency" width="800" height="774"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each process receives a request, calls Bedrock to embed the query text, then fans out across categories using ThreadPoolExecutor. The catalog data (loaded from DynamoDB at process init) stays in memory across all requests handled by that process.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why LMI Instead of Standard Lambda
&lt;/h3&gt;

&lt;p&gt;This workload is a poor fit for standard Lambda and a strong fit for LMI. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-memory catalog at scale.&lt;/strong&gt; Each process loads the product catalog with embedding vectors into memory at initialization. A 100K product catalog with 384-dimensional vectors is roughly 150 MB per process. With 10 concurrent processes, that's 1.5 GB for catalog data alone. Standard Lambda's maximum is 10 GB total, and you pay per-GB-second for every millisecond of that memory. LMI gives you up to 32 GB with configurable memory-to-vCPU ratios, and you pay EC2 instance pricing regardless of how much memory your function uses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-concurrency amortizes catalog loading.&lt;/strong&gt; On standard Lambda, 10 concurrent requests means 10 independent execution environments, each cold-starting and loading the catalog into its own memory, each paying per-GB-second. On LMI, those 10 requests run as 10 processes on one EC2 instance. The catalog loads once per process at init time and stays warm for all subsequent requests routed to that process. At sustained throughput, this eliminates the repeated cold-start penalty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sustained throughput economics.&lt;/strong&gt; A product recommendation API serving a storefront has predictable, sustained traffic - hundreds of requests per second during business hours. Each request involves a Bedrock API call for query embedding (I/O), cosine similarity across categories (CPU), and structured logging (I/O). At 10M+ invocations per month, EC2 pricing with Savings Plans is 60-72% cheaper than standard Lambda's per-GB-second model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configurable memory-to-vCPU ratio.&lt;/strong&gt; This workload is memory-heavy (large catalog) with moderate CPU needs (vector math on 384 dimensions). The 4:1 memory-to-vCPU ratio gives 4 GB of memory per vCPU - enough for the catalog plus Bedrock client overhead. Standard Lambda locks you into a fixed ratio where more memory always means proportionally more CPU and higher cost.&lt;/p&gt;


&lt;h3&gt;
  
  
  Why Not Fargate?
&lt;/h3&gt;

&lt;p&gt;This project could run on ECS Fargate. The handler logic would move into a FastAPI app, the catalog would load at container startup, and an ALB would handle routing. It would work fine. But the infrastructure footprint would be significantly larger:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Lambda Managed Instances&lt;/th&gt;
&lt;th&gt;ECS Fargate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Application code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single handler function&lt;/td&gt;
&lt;td&gt;Web framework + Dockerfile + health checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Capacity provider + function&lt;/td&gt;
&lt;td&gt;Cluster + task def + service + ALB + target group + listener rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built into capacity provider&lt;/td&gt;
&lt;td&gt;Application Auto Scaling policies (target tracking, step scaling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Event triggers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native (SQS, EventBridge, API Gateway, S3)&lt;/td&gt;
&lt;td&gt;Requires separate wiring per trigger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Terraform lines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200 across 4 modules&lt;/td&gt;
&lt;td&gt;~400-500 with ALB, ECR, auto-scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Container image&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not needed (zip deployment)&lt;/td&gt;
&lt;td&gt;Required (Dockerfile, ECR push, image lifecycle)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For teams already comfortable with Lambda, LMI is the path of least resistance to get EC2 pricing and multi-concurrency without learning container orchestration. You keep the programming model you know and gain the hardware flexibility you need. The reverse is also true: &lt;strong&gt;for teams already invested in ECS, Fargate may remain the more operationally familiar choice&lt;/strong&gt; - the muscle memory, dashboards, deployment pipelines, and on-call runbooks are already in place.&lt;/p&gt;

&lt;p&gt;Where Fargate or EKS would be the better choice: custom native dependencies that exceed Lambda layer limits (PyTorch, large ML models), persistent connections (WebSocket, gRPC), specialized instance types not supported by LMI, or workloads that need the Kubernetes ecosystem. I covered Fargate patterns in my &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS deep dive&lt;/a&gt; and &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Kabob Store&lt;/a&gt; projects. My &lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;EKS Auto Mode&lt;/a&gt; article covers Karpenter-based scaling.&lt;/p&gt;

&lt;p&gt;One specific area where EKS with Karpenter is significantly more sophisticated: &lt;strong&gt;scaling down and cost optimization at idle.&lt;/strong&gt; LMI's scale-down is conservative - in my testing, 2 EC2 instances remained running overnight with zero traffic (1 per AZ). There's no minimum instance setting, no consolidation, and no way to force scale-to-zero short of deleting the function version or capacity provider. Karpenter, by contrast, actively consolidates workloads onto fewer nodes, replaces larger instances with smaller ones when demand drops, and can use Spot instances for fault-tolerant workloads. If your traffic has significant idle periods (nights, weekends), this difference matters for cost. LMI's simplicity comes at the price of less intelligent scaling.&lt;/p&gt;


&lt;h2&gt;
  
  
  Setting It Up with Terraform
&lt;/h2&gt;

&lt;p&gt;The complete infrastructure is organized into four Terraform modules: networking, IAM, capacity provider, and Lambda. Every IAM policy follows least privilege, and the configuration follows the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html" rel="noopener noreferrer"&gt;AWS Well-Architected Framework&lt;/a&gt; Security and Cost Optimization pillars. All resources use official HashiCorp providers (&lt;code&gt;hashicorp/aws&lt;/code&gt; and &lt;code&gt;hashicorp/archive&lt;/code&gt; where applicable) - no community modules or third-party providers.&lt;/p&gt;

&lt;p&gt;For a fully production-hardened deployment, you'd also want to address the Reliability, Performance Efficiency, and Operational Excellence pillars more explicitly. The &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/welcome.html" rel="noopener noreferrer"&gt;AWS Serverless Applications Lens&lt;/a&gt; emphasizes thinking in concurrent requests, sharing nothing, designing for failures and duplicates, and using versions and aliases for safe reversible deployments. Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ deployment&lt;/strong&gt; - subnets in at least two AZs (this demo does)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption at rest with customer-managed KMS keys&lt;/strong&gt; - on the capacity provider (&lt;code&gt;kms_key_arn&lt;/code&gt; on &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;), DynamoDB (&lt;code&gt;server_side_encryption&lt;/code&gt; with &lt;code&gt;kms_key_arn&lt;/code&gt;), and CloudWatch Logs (&lt;code&gt;kms_key_id&lt;/code&gt; on the log group)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoints instead of NAT Gateway&lt;/strong&gt; (covered later in this article)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invoke through an alias, not the published version directly&lt;/strong&gt; - The demo invokes the qualified ARN of the published function version (&lt;code&gt;function:name:1&lt;/code&gt;). For production, create an alias (&lt;code&gt;prod&lt;/code&gt;, &lt;code&gt;live&lt;/code&gt;, &lt;code&gt;stable&lt;/code&gt;) pointing to a specific version and have callers invoke the alias ARN. Aliases enable instant rollback by updating one pointer, support traffic-shifting deployments (10% to a new version, then 50%, then 100%), and decouple caller code from version numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency for downstream side effects&lt;/strong&gt; - &lt;strong&gt;Because Lambda may retry or duplicate events, handlers must remain idempotent - even when using long-lived in-memory state.&lt;/strong&gt; The Powertools idempotency utility uses DynamoDB to deduplicate requests by a configurable key. For this similarity engine the Bedrock embedding call is a read operation and the only state change is logging, so idempotency is less critical. For handlers that write to DynamoDB, send notifications, or charge a payment, idempotency is essential because LMI's at-least-once delivery semantics mean retries can produce duplicate side effects. The in-memory catalog is read-only and shared across requests, but any per-request state that produces side effects needs deduplication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch alarms on LMI-specific metrics&lt;/strong&gt; (covered in the Observability section)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The demo includes the basics. The production hardening above is straightforward incremental work.&lt;/p&gt;
&lt;h3&gt;
  
  
  Provider Configuration
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 1.11.0"&lt;/span&gt;

  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Validated with AWS provider v6.x (tested with 6.31+)&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.31"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;archive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/archive"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 2.7"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt;
  &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_profile&lt;/span&gt;  &lt;span class="c1"&gt;# Set via AWS_PROFILE env var or -var flag&lt;/span&gt;

  &lt;span class="nx"&gt;default_tags&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;Project&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda-managed-instances"&lt;/span&gt;
      &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="nx"&gt;ManagedBy&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;~&amp;gt; 6.31&lt;/code&gt; constraint pins to the current stable major (6.31.0 at the time of writing) without locking too tightly. &lt;code&gt;memory_size&lt;/code&gt; values above 10240 MB require hashicorp/aws 6.29.0 or later - earlier releases had a schema validator that capped &lt;code&gt;memory_size&lt;/code&gt; at 10 GB even for LMI functions (fixed in #46065). Without a recent provider, attempting to set 16 GB or 32 GB on an LMI function fails at &lt;code&gt;terraform plan&lt;/code&gt; with a confusing validation error.&lt;/p&gt;
&lt;h3&gt;
  
  
  IAM: The Two-Role Model
&lt;/h3&gt;

&lt;p&gt;Lambda Managed Instances requires two separate IAM roles - a deliberate separation of concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operator Role&lt;/strong&gt; - Allows Lambda to manage EC2 instances on your behalf. Your function code never gets these permissions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"operator"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-operator"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda.amazonaws.com"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"aws:SourceAccount"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"operator"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::aws:policy/service-role/AWSLambdaManagedEC2ResourceOperator"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Execution Role&lt;/strong&gt; - Scoped to only what the function needs. No EC2 permissions, no wildcard resources. Bedrock access is limited to specific embedding model families.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# DynamoDB - least privilege: handler only does Query on the category-index GSI.&lt;/span&gt;
&lt;span class="c1"&gt;# The seed script runs locally with the operator's credentials and uses its own&lt;/span&gt;
&lt;span class="c1"&gt;# IAM identity for PutItem - not this role.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"execution_dynamodb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dynamodb:Query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dynamodb_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"${var.dynamodb_table}/index/*"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Bedrock - scoped to the specific configured embedding model&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"execution_bedrock"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bedrock-embeddings"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"arn:aws:bedrock:${var.aws_region}::foundation-model/${var.embedding_model_id}"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime function only needs &lt;code&gt;dynamodb:Query&lt;/code&gt; because &lt;code&gt;_load_catalog()&lt;/code&gt; queries the &lt;code&gt;category-index&lt;/code&gt; GSI rather than scanning the table. No PutItem, no GetItem, no Scan. The seed script (&lt;code&gt;scripts/seed_catalog.py&lt;/code&gt;) runs locally on the developer's machine with their own IAM identity - it never assumes the function execution role, so the runtime role doesn't need write permissions. The Bedrock policy is scoped to the exact model ARN configured via &lt;code&gt;var.embedding_model_id&lt;/code&gt;, not a wildcard. This is what "least privilege" looks like when you actually walk through the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capacity Provider
&lt;/h3&gt;

&lt;p&gt;The capacity provider defines the EC2 infrastructure where your functions run. The &lt;code&gt;scaling_mode = "Manual"&lt;/code&gt; with a target CPU utilization policy gives you control over scaling behavior while still letting Lambda handle the mechanics.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_capacity_provider"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnet_ids&lt;/span&gt;
    &lt;span class="nx"&gt;security_group_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;security_group_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;permissions_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;capacity_provider_operator_role_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;operator_role_arn&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;instance_requirements&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;architectures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# "arm64" for Graviton&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;capacity_provider_scaling_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;max_vcpu_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_vcpu_count&lt;/span&gt;
    &lt;span class="nx"&gt;scaling_mode&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Manual"&lt;/span&gt;

    &lt;span class="nx"&gt;scaling_policies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;predefined_metric_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"LambdaCapacityProviderAverageCPUUtilization"&lt;/span&gt;
      &lt;span class="nx"&gt;target_value&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target_cpu_utilization&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The capacity provider supports two scaling modes: &lt;code&gt;Auto&lt;/code&gt; and &lt;code&gt;Manual&lt;/code&gt;. Auto mode is hands-off - Lambda picks an internal target CPU utilization and scales based on AWS-chosen defaults, with no explicit &lt;code&gt;scaling_policies&lt;/code&gt; block needed. I chose Manual mode for this project because it lets me set an explicit target (50% in the demo config) so the scaling behavior is predictable and tunable. With a lower target, the capacity provider scales out faster and maintains more headroom for traffic bursts. For a production workload where you trust AWS to pick reasonable defaults, Auto mode is simpler and a valid choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Function with Capacity Provider
&lt;/h3&gt;

&lt;p&gt;Four key differences from a standard Lambda function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;capacity_provider_config&lt;/code&gt; attaches the function to LMI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;publish = true&lt;/code&gt; is required - LMI runs on published versions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_size&lt;/code&gt; minimum is 2048 MB (2 GB / 1 vCPU)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;execution_environment_memory_gib_per_vcpu&lt;/code&gt; controls the memory-to-vCPU ratio (new in March 2026)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;powertools_layer_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"arm64"&lt;/span&gt;
    &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:lambda:${var.aws_region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python314-arm64:${var.powertools_layer_version}"&lt;/span&gt;
    &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:lambda:${var.aws_region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python314-x86_64:${var.powertools_layer_version}"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;powertools_env_vars&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_SERVICE_NAME&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-handler"&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_METRICS_NAMESPACE&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metrics_namespace&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_LOG_LEVEL&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_level&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"handler"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-handler"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution_role_arn&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"handler.lambda_handler"&lt;/span&gt;
  &lt;span class="nx"&gt;runtime&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"python3.14"&lt;/span&gt;
  &lt;span class="nx"&gt;architectures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;memory_size&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_memory_size&lt;/span&gt;
  &lt;span class="nx"&gt;timeout&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
  &lt;span class="nx"&gt;publish&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;filename&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;archive_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_path&lt;/span&gt;
  &lt;span class="nx"&gt;source_code_hash&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;archive_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_base64sha256&lt;/span&gt;

  &lt;span class="nx"&gt;layers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;powertools_layer_arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;capacity_provider_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;lambda_managed_instances_capacity_provider_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;capacity_provider_arn&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity_provider_arn&lt;/span&gt;
      &lt;span class="nx"&gt;execution_environment_memory_gib_per_vcpu&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;memory_gib_per_vcpu&lt;/span&gt;
      &lt;span class="nx"&gt;per_execution_environment_max_concurrency&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_concurrency_per_environment&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logging_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;log_format&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"JSON"&lt;/span&gt;
    &lt;span class="nx"&gt;application_log_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_level&lt;/span&gt;
    &lt;span class="nx"&gt;system_log_level&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"WARN"&lt;/span&gt;
    &lt;span class="nx"&gt;log_group&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cloudwatch_log_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tracing_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Active"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;variables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;powertools_env_vars&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;DYNAMODB_TABLE&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_dynamodb_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
      &lt;span class="nx"&gt;ENVIRONMENT&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="nx"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding_model_id&lt;/span&gt;
      &lt;span class="nx"&gt;EMBEDDING_DIMENSION&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tostring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding_dimension&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;memory_gib_per_vcpu&lt;/code&gt; setting is powerful. LMI enforces a minimum of 1 vCPU per execution environment, so the ratio determines how much memory you get for that minimum. Examples at the 8 GB level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2:1 ratio&lt;/strong&gt; = 8 GB / 4 vCPUs (compute-heavy: batch processing, data crunching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4:1 ratio&lt;/strong&gt; = 8 GB / 2 vCPUs (balanced: API handlers, typical workloads)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8:1 ratio&lt;/strong&gt; = 8 GB / 1 vCPU (memory-heavy: large in-memory datasets, caching)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The product similarity engine uses 4 GB at 4:1 - 1 vCPU per environment, which is the smallest balanced configuration that fits the catalog plus 10 worker processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Note on Packaging Dependencies
&lt;/h3&gt;

&lt;p&gt;The Powertools layer is pinned to a specific version (minimum 3.23.0 - the first release that officially supports LMI). For everything else, follow AWS's guidance for Python Lambda functions: package all dependencies, including &lt;code&gt;boto3&lt;/code&gt; and &lt;code&gt;botocore&lt;/code&gt;, with the function rather than relying on the runtime's bundled copies. &lt;strong&gt;Even though boto3 is available in the runtime, package it with your function to avoid version drift.&lt;/strong&gt; The runtime's boto3 is updated on AWS's schedule, not yours, and version drift between local development and the runtime can produce subtle bugs that are hard to reproduce. For production zip deployments, &lt;code&gt;pip install --target build/ boto3 botocore&lt;/code&gt; and ship them in the zip. The demo uses the runtime's boto3 for simplicity, but production code should not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Concurrency by Language
&lt;/h2&gt;

&lt;p&gt;LMI supports five runtimes today: Python 3.13+, Node.js 22+, Java 21+, .NET 8+, and Rust on the OS-only runtime. All modern runtimes (Python 3.12+) are based on Amazon Linux 2023, replacing AL2 ahead of its June 2026 end-of-life. &lt;strong&gt;Every language handles multi-concurrency differently&lt;/strong&gt;, and the differences matter - they change how you write the handler, what concurrency bugs you have to worry about, and how memory scales.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;Concurrency Model&lt;/th&gt;
&lt;th&gt;What This Means for Your Handler&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple processes per environment&lt;/td&gt;
&lt;td&gt;Full isolation - each process has its own memory, globals, and boto3 clients. No thread-safety concerns. Memory multiplies linearly with concurrency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Node.js&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Worker threads with async dispatch&lt;/td&gt;
&lt;td&gt;Each worker thread can also handle async requests concurrently. Requires safe handling of shared state.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Java&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single process with OS threads&lt;/td&gt;
&lt;td&gt;Multiple threads execute the handler simultaneously in shared memory. Requires explicit thread-safe code: synchronized collections, no shared mutable state, atomic operations. The hardest model to get right.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;.NET&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;.NET Tasks with async processing&lt;/td&gt;
&lt;td&gt;Same patterns as ASP.NET Core - thread-safe data structures, no static mutable state.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rust&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single process, Tokio async tasks&lt;/td&gt;
&lt;td&gt;Compile-time enforcement: handlers must implement &lt;code&gt;Clone + Send&lt;/code&gt;. The compiler catches concurrency bugs that other languages catch at runtime (or in production).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Python is the &lt;strong&gt;simplest&lt;/strong&gt; model because there's no shared memory between concurrent requests. The trade-off is per-process memory multiplication. Java is the &lt;strong&gt;hardest&lt;/strong&gt; because thread safety becomes a concern on every line that touches shared state. Rust is the &lt;strong&gt;safest&lt;/strong&gt; because the compiler refuses to let you write non-thread-safe code in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This blog focuses on the Python implementation.&lt;/strong&gt; The patterns shown here (process isolation, ThreadPoolExecutor for parallel I/O within a request, memory tuning around &lt;code&gt;per_execution_environment_max_concurrency&lt;/code&gt;) are specific to how Python's LMI runtime works. The architecture concepts (capacity providers, scaling, networking, IAM) apply identically across all five languages, but the handler code patterns would differ if you were writing in Java or Rust.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Python Handler
&lt;/h2&gt;

&lt;p&gt;Python's LMI runtime uses multiple &lt;strong&gt;processes&lt;/strong&gt; (not threads) for multi-concurrency. Each concurrent request runs in a separate process with its own memory space. Global variables, module-level caches, and boto3 clients are completely isolated between processes. This is simpler than the thread-based and async models above because there are no shared-memory concurrency concerns.&lt;/p&gt;

&lt;p&gt;This blog uses Python 3.14, the newest supported version. Note that Lambda's Python 3.14 ships with the JIT and free-threaded mode disabled, so the GIL is still in effect.&lt;/p&gt;

&lt;p&gt;The one shared resource: &lt;code&gt;/tmp&lt;/code&gt;. All processes in an execution environment share the same &lt;code&gt;/tmp&lt;/code&gt; directory. Use request-scoped filenames to prevent collisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handler Structure with Powertools
&lt;/h3&gt;

&lt;p&gt;Following the &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools best practices&lt;/a&gt; pattern - Logger, Tracer, and Metrics decorators in the correct order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_completed&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tracer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MetricUnit&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.utilities.typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LambdaContext&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Module-level init runs ONCE PER PROCESS.
# With 10 concurrent processes, this runs 10 times.
# Each process loads its own catalog copy and boto3 clients.
&lt;/span&gt;&lt;span class="n"&gt;PROCESS_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;AWS_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EMBEDDING_MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-2-multimodal-embeddings-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EMBEDDING_DIMENSION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_DIMENSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;384&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;dynamodb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DYNAMODB_TABLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;bedrock_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;


&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_load_catalog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load product catalog once per process. Uses Query (least privilege).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# already loaded in this process
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="c1"&gt;# ... query DynamoDB by category and populate _catalog ...
&lt;/span&gt;

&lt;span class="nd"&gt;@logger.inject_lambda_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_event&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_lambda_handler&lt;/span&gt;
&lt;span class="nd"&gt;@metrics.log_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capture_cold_start_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LambdaContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROCESS_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;_load_catalog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# no-op after first call in this process
&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract params from event body or direct invocation
&lt;/span&gt;    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;else &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Embed the query text via Bedrock (I/O-bound)
&lt;/span&gt;    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Search categories in parallel (CPU-bound)
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_search_category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SearchRequests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;})}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bedrock Embedding - Configurable Model
&lt;/h3&gt;

&lt;p&gt;The query text is embedded via Amazon Bedrock before similarity search. The model is configurable via the &lt;code&gt;EMBEDDING_MODEL_ID&lt;/code&gt; environment variable - Nova Multimodal Embeddings by default, with Titan Text Embeddings V2 as an alternative:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_embed_query_nova&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Nova Multimodal Embeddings request format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;request_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;taskType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SINGLE_EMBEDDING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;singleEmbeddingParams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingPurpose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TEXT_RETRIEVAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingDimension&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EMBEDDING_DIMENSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;truncationMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Product embeddings are generated at seed time using &lt;code&gt;GENERIC_INDEX&lt;/code&gt; purpose and stored in DynamoDB alongside the product data. Query embeddings use &lt;code&gt;TEXT_RETRIEVAL&lt;/code&gt; purpose at runtime. Nova supports 4 dimension sizes (256, 384, 1024, 3072) - trading off accuracy against memory and compute cost. The demo uses 384 dimensions as a practical balance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cosine Similarity - The CPU Bottleneck
&lt;/h3&gt;

&lt;p&gt;The vector similarity computation is the compute-intensive core after the Bedrock call returns. For production, use NumPy - it's 10-50x faster than a pure Python loop and releases the GIL during C-level operations, which makes the ThreadPoolExecutor pattern actually parallel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Production version: batch operation across all products in a category.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# query shape: (D,), catalog shape: (N, D)
&lt;/span&gt;    &lt;span class="n"&gt;norms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norms&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;norms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pure Python version is included in the demo as an educational fallback (no NumPy dependency, easier to read):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cosine_similarity_pure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Educational version: shows the math, no dependencies.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;norm_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;norm_b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;norm_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler code, the capacity provider, the Terraform - none of it would need to change to run on an instance type with hardware-accelerated vector operations. The capacity provider's instance type selection is the only variable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Process Memory Multiplication
&lt;/h3&gt;

&lt;p&gt;This is the most important thing to understand about Python LMI. Each process loads its own copy of the catalog:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 concurrent processes x 200 MB catalog = 2 GB just for catalog data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;MemoryUtilization&lt;/code&gt; CloudWatch metric tracks total memory consumption across all processes. If you're loading large datasets and running high concurrency, you'll hit memory limits. Tune with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; (fewer processes, less memory)&lt;/li&gt;
&lt;li&gt;Increase &lt;code&gt;memory_size&lt;/code&gt; (more memory per environment)&lt;/li&gt;
&lt;li&gt;Use 8:1 &lt;code&gt;memory_gib_per_vcpu&lt;/code&gt; ratio (more memory, fewer vCPUs)&lt;/li&gt;
&lt;li&gt;Use shared &lt;code&gt;/tmp&lt;/code&gt; as a cross-process cache (load once, read from all processes)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;LMI publishes its own CloudWatch metrics in the &lt;code&gt;AWS/Lambda&lt;/code&gt; namespace at 5-minute granularity. The capacity-provider-level metrics describe overall instance utilization; the execution-environment-level metrics describe per-function resource consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity provider metrics&lt;/strong&gt; (dimensions: &lt;code&gt;CapacityProviderName&lt;/code&gt;, &lt;code&gt;InstanceType&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CPUUtilization&lt;/code&gt; - CPU usage across all instances in the capacity provider&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MemoryUtilization&lt;/code&gt; - Memory usage across all instances&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vCPUAllocated&lt;/code&gt; / &lt;code&gt;vCPUAvailable&lt;/code&gt; - Used vs available vCPU count&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MemoryAllocated&lt;/code&gt; / &lt;code&gt;MemoryAvailable&lt;/code&gt; - Used vs available memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Execution environment metrics&lt;/strong&gt; (dimensions: &lt;code&gt;FunctionName&lt;/code&gt;, &lt;code&gt;CapacityProviderName&lt;/code&gt;, &lt;code&gt;Resource&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; - Active concurrent requests per environment&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; - Configured maximum concurrency per environment&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentCPUUtilization&lt;/code&gt; - CPU usage of this function's environments&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentMemoryUtilization&lt;/code&gt; - Memory usage of this function's environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Alarms to Set First
&lt;/h3&gt;

&lt;p&gt;If you only set three alarms when adopting LMI, set these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Capacity provider CPU utilization&lt;/strong&gt; - Alarm when sustained CPU exceeds your scaling target (e.g., &amp;gt; 80% for 10 minutes if your target is 50%). This indicates the capacity provider is failing to scale out fast enough or has hit &lt;code&gt;max_vcpu_count&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment concurrency vs limit&lt;/strong&gt; - Alarm when &lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; reaches &lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; for sustained periods. This means processes are saturated and incoming requests are being throttled or queued.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment memory utilization&lt;/strong&gt; - Alarm when memory exceeds 80%. With Python's per-process memory multiplication, hitting memory limits causes new process spawns to fail (&lt;code&gt;InitResourceExhausted&lt;/code&gt;) rather than gradual degradation. Catch this before it happens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These three cover the LMI-specific failure modes that standard Lambda alarms (Errors, Throttles, Duration) won't catch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI configured with a profile (&lt;code&gt;export AWS_PROFILE=your-profile&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Terraform &amp;gt;= 1.11&lt;/li&gt;
&lt;li&gt;Python 3.14+ with boto3 (for the seed script)&lt;/li&gt;
&lt;li&gt;Amazon Nova Multimodal Embeddings model enabled in your AWS account (Bedrock console, Model Access)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/RDarrylR/lambda-managed-instances-similarity-engine.git
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda-managed-instances-similarity-engine

&lt;span class="c"&gt;# Configure&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;infrastructure/terraform.tfvars.example infrastructure/terraform.tfvars
&lt;span class="c"&gt;# Edit terraform.tfvars with your values&lt;/span&gt;

&lt;span class="c"&gt;# Deploy infrastructure&lt;/span&gt;
make init
make apply

&lt;span class="c"&gt;# Seed the product catalog&lt;/span&gt;
make seed

&lt;span class="c"&gt;# Invoke&lt;/span&gt;
make invoke
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cost Analysis
&lt;/h3&gt;

&lt;p&gt;Lambda Managed Instances pricing is fundamentally different from standard Lambda. Understanding when each model wins is the key decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda pricing (arm64/Graviton):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.20 per million requests&lt;/li&gt;
&lt;li&gt;$0.0000133334 per GB-second (arm64)&lt;/li&gt;
&lt;li&gt;No minimum charge, no idle cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lambda Managed Instances pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.20 per million requests (same)&lt;/li&gt;
&lt;li&gt;EC2 on-demand instance pricing (varies by type)&lt;/li&gt;
&lt;li&gt;15% management fee on the EC2 on-demand price&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No per-invocation duration charge&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical difference: standard Lambda charges per GB-second of execution. LMI charges for EC2 time regardless of how many requests you serve. At low volume, you're paying for idle EC2 capacity. At high volume, that fixed EC2 cost is amortized across millions of requests.&lt;/p&gt;

&lt;h4&gt;
  
  
  Break-Even: Standard Lambda vs LMI
&lt;/h4&gt;

&lt;p&gt;Consider this workload: 4 GB memory, 200ms average duration, sustained traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda cost per request (arm64):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compute: 4 GB x 0.2s = 0.8 GB-seconds x $0.0000133334 = $0.00001067&lt;/li&gt;
&lt;li&gt;Request: $0.0000002&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$0.0000109 per request&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LMI on a c7g.medium (1 vCPU, 2 GB, ~$0.034/hr on-demand):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 + 15% fee: $0.034 x 1.15 = $0.0391/hr&lt;/li&gt;
&lt;li&gt;With 10 concurrent processes and 200ms per request, each process handles ~5 req/sec&lt;/li&gt;
&lt;li&gt;Instance throughput: ~50 req/sec = ~180,000 req/hr&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per request: $0.0391 / 180,000 = ~$0.000000217&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this throughput, &lt;strong&gt;LMI is roughly 50x cheaper per request&lt;/strong&gt; than standard Lambda. But the EC2 cost runs 24/7 whether you have traffic or not.&lt;/p&gt;

&lt;h4&gt;
  
  
  Monthly Cost Comparison
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monthly Requests&lt;/th&gt;
&lt;th&gt;Instances Needed&lt;/th&gt;
&lt;th&gt;Standard Lambda (arm64)&lt;/th&gt;
&lt;th&gt;LMI On-Demand (c7g.medium)&lt;/th&gt;
&lt;th&gt;LMI + 1yr Savings Plan&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$11&lt;/td&gt;
&lt;td&gt;$28 + $0.20 = &lt;strong&gt;$28&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$109&lt;/td&gt;
&lt;td&gt;$28 + $2.00 = &lt;strong&gt;$30&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$546&lt;/td&gt;
&lt;td&gt;$28 + $10.00 = &lt;strong&gt;$38&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$1,091&lt;/td&gt;
&lt;td&gt;$28 + $20.00 = &lt;strong&gt;$48&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500M&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;$5,456&lt;/td&gt;
&lt;td&gt;$112 + $100.00 = &lt;strong&gt;$212&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$172&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A single c7g.medium tops out around ~130M requests/month at 50 req/sec sustained. Beyond that, instance count scales roughly linearly with load - 500M req/month requires approximately 4 instances. The LMI columns reflect the actual instance count needed at each volume.&lt;/p&gt;

&lt;p&gt;The break-even is around &lt;strong&gt;2.5M requests/month&lt;/strong&gt; at this memory and duration profile. Below that, standard Lambda wins because you pay nothing when idle. Above that, LMI wins and the advantage grows with volume.&lt;/p&gt;

&lt;h4&gt;
  
  
  Commitment Discounts Change the Math
&lt;/h4&gt;

&lt;p&gt;LMI supports EC2 Savings Plans and Reserved Instances. Standard Lambda supports Compute Savings Plans (up to 17% discount on duration). The discount gap is significant:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Commitment&lt;/th&gt;
&lt;th&gt;Standard Lambda Discount&lt;/th&gt;
&lt;th&gt;LMI Discount (EC2)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;None (on-demand)&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year Compute Savings Plan&lt;/td&gt;
&lt;td&gt;Up to 17%&lt;/td&gt;
&lt;td&gt;Up to 36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year Compute Savings Plan&lt;/td&gt;
&lt;td&gt;Up to 17%&lt;/td&gt;
&lt;td&gt;Up to 56%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year EC2 Reserved Instance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Up to 40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year EC2 Reserved Instance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Up to 60%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For predictable production workloads with steady traffic, a 3-year commitment on LMI can reduce costs by 60% on the EC2 portion. Standard Lambda's maximum discount is 17%. This difference widens the gap at scale.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hidden Costs
&lt;/h4&gt;

&lt;p&gt;Don't forget the supporting infrastructure that LMI requires and standard Lambda doesn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NAT Gateway&lt;/strong&gt;: ~$32/month + $0.045/GB data transfer (required for VPC telemetry)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoints&lt;/strong&gt; (if used instead of NAT): ~$7.20/month per endpoint per AZ&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt;: On-demand reads for catalog loading (minimal for small catalogs, significant at scale)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock&lt;/strong&gt;: Nova Multimodal Embeddings per-token pricing for each query embedding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch&lt;/strong&gt;: Log storage and metric costs increase with concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For low-volume workloads, these fixed costs can exceed the compute savings. Factor them into your total cost of ownership.&lt;/p&gt;

&lt;h4&gt;
  
  
  When Each Pricing Model Wins
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is bursty or unpredictable (you pay nothing at zero traffic)&lt;/li&gt;
&lt;li&gt;Monthly volume is below the break-even threshold (~2-3M requests for this workload profile)&lt;/li&gt;
&lt;li&gt;You can't commit to 1-year or 3-year terms&lt;/li&gt;
&lt;li&gt;You don't need VPC connectivity (avoids NAT Gateway cost)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LMI wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is sustained and predictable (the EC2 cost is fully amortized)&lt;/li&gt;
&lt;li&gt;Monthly volume exceeds 5-10M requests&lt;/li&gt;
&lt;li&gt;You can commit to Savings Plans or Reserved Instances&lt;/li&gt;
&lt;li&gt;You need more than 10 GB memory or specific instance types&lt;/li&gt;
&lt;li&gt;You're already paying for VPC infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this demo, expect to pay for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NAT Gateway (~$0.045/hour + data transfer)&lt;/li&gt;
&lt;li&gt;EC2 instances (varies by type, auto-selected by Lambda)&lt;/li&gt;
&lt;li&gt;DynamoDB on-demand reads (minimal for this catalog size)&lt;/li&gt;
&lt;li&gt;Bedrock embedding calls (per-token pricing for each query)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CLEANUP (IMPORTANT!!)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;This infrastructure costs real money while running - approximately $2-4/day even with zero traffic (NAT Gateway + EC2 managed instances). Don't forget about it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make sure to destroy all resources when you're done:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the capacity provider fails to delete (it can take a few minutes to drain instances), wait and retry. Verify in the AWS console that no EC2 instances tagged with your project name are still running.&lt;/p&gt;




&lt;h2&gt;
  
  
  Networking: Three Supported Patterns
&lt;/h2&gt;

&lt;p&gt;LMI requires VPC connectivity - the function execution environments need outbound network access for telemetry transmission and any AWS service calls. AWS documents three supported connectivity patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Public subnets with an internet gateway&lt;/strong&gt; - simplest, suitable for dev/test only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnets with NAT Gateway&lt;/strong&gt; - the pattern this demo uses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnets with VPC endpoints&lt;/strong&gt; - the most AWS-aligned production pattern&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  NAT Gateway (used in this demo)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple to set up - one resource, all outbound traffic routes through it&lt;/li&gt;
&lt;li&gt;~$32/month base + $0.045/GB data transfer&lt;/li&gt;
&lt;li&gt;Traffic leaves your VPC, crosses the public internet (encrypted), then re-enters AWS&lt;/li&gt;
&lt;li&gt;Single point of failure unless you deploy one per AZ (~$64/month for 2-AZ HA)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  VPC Endpoints (recommended for production)
&lt;/h3&gt;

&lt;p&gt;For production, the most AWS-aligned pattern is one VPC endpoint per service per AZ. Traffic stays entirely on the AWS network and never touches the public internet. The endpoint set must cover &lt;strong&gt;every&lt;/strong&gt; service the function calls - if you forget one, the function fails silently or hangs. For this workload, that means:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Required For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.logs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;CloudWatch Logs (Powertools logger output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.monitoring&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;CloudWatch Metrics (Powertools metrics)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.xray&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;X-Ray tracing (Powertools tracer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.bedrock-runtime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;Bedrock embedding API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.dynamodb&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gateway&lt;/td&gt;
&lt;td&gt;DynamoDB catalog queries (free, no per-AZ charge)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Critical security group detail:&lt;/strong&gt; Interface endpoints have their own security groups. They must allow inbound HTTPS (port 443) from the function's security group. The function security group must allow outbound HTTPS to the endpoint security groups. If you skip this, DNS resolves but the connection is silently blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Endpoints should be deployed in each AZ used by the capacity provider to avoid cross-AZ latency and data transfer costs.&lt;/strong&gt; If your capacity provider has subnets in &lt;code&gt;us-east-1a&lt;/code&gt; and &lt;code&gt;us-east-1b&lt;/code&gt;, every interface endpoint also needs ENIs in both AZs. This is the same Cross-AZ Tax pattern from my &lt;a href="https://darryl-ruggles.cloud/eks-and-the-cross-az-tax-how-to-stop-paying-aws-002gb-for-traffic-that-should-never-leave-your-availability-zone/" rel="noopener noreferrer"&gt;previous blog&lt;/a&gt; - cross-AZ data transfer charges apply when traffic from a function in &lt;code&gt;us-east-1a&lt;/code&gt; hits an endpoint ENI in &lt;code&gt;us-east-1b&lt;/code&gt;. Provision endpoints per AZ to keep traffic local.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost math:&lt;/strong&gt; ~$7.20/month per interface endpoint per AZ. With 4 interface endpoints across 2 AZs, that's ~$58/month - roughly double the single NAT Gateway, but cheaper than 2-AZ NAT Gateway HA. The DynamoDB gateway endpoint is free. At high data transfer volumes (more than ~900 GB/month through the NAT Gateway), endpoints become cheaper because there's no per-GB data transfer surcharge for in-region traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When endpoints win on security:&lt;/strong&gt; Always. Traffic never leaves the AWS network. You can attach endpoint policies to restrict which resources each endpoint can access (e.g., limit the Bedrock endpoint to specific model ARNs). This aligns with the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/security.html" rel="noopener noreferrer"&gt;AWS Well-Architected Security Pillar&lt;/a&gt; - minimize the attack surface.&lt;/p&gt;

&lt;p&gt;The Terraform for VPC endpoints is straightforward but verbose. I left it out of this demo to keep the focus on LMI itself. A follow-up project could add a &lt;code&gt;networking_mode&lt;/code&gt; variable that switches between NAT Gateway and VPC endpoints.&lt;/p&gt;




&lt;h2&gt;
  
  
  A few things to watch for:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VPC connectivity isn't optional.&lt;/strong&gt; Lambda Managed Instances requires a VPC. Without outbound connectivity (NAT Gateway or VPC endpoints), your function executes but logs and traces are silently lost. You'll debug a working function with no visible output. This is documented but easy to miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling is asynchronous.&lt;/strong&gt; LMI scales based on CPU utilization and execution-environment saturation, not per-invocation demand. &lt;strong&gt;Unlike standard Lambda, scaling isn't triggered by incoming requests - it's driven by resource consumption inside existing execution environments.&lt;/strong&gt; &lt;strong&gt;Because scaling reacts to resource pressure instead of incoming traffic, inefficient code or high memory usage can delay scaling and increase throttling risk.&lt;/strong&gt; The Scaler component decides when to add or remove instances, and instance launches aren't instant. &lt;strong&gt;Lambda maintains headroom so traffic can roughly double within minutes without immediate throttling&lt;/strong&gt;, but if your traffic more than doubles within 5 minutes, you may see 429 throttles while capacity catches up. This is fundamentally different from standard Lambda's near-instant scaling. Plan for it with the target CPU utilization setting - lower values maintain more headroom.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process memory multiplies.&lt;/strong&gt; With Python, each concurrency slot is a separate process. &lt;strong&gt;Because Python uses process-based concurrency, memory usage scales linearly with concurrency - each worker process consumes its own memory. With Python, concurrency isn't "free" - each additional request increases memory consumption linearly.&lt;/strong&gt; If your function uses 500 MB of memory and you set concurrency to 16, that's 8 GB of memory consumed per execution environment. Monitor the &lt;code&gt;MemoryUtilization&lt;/code&gt; metric and tune accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;publish = true&lt;/code&gt; is required.&lt;/strong&gt; LMI runs on published function versions, not &lt;code&gt;$LATEST&lt;/code&gt;. If you forget this, Terraform applies successfully but the function doesn't run on managed instances. Every code change needs a new published version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capacity providers are security boundaries, not isolation boundaries.&lt;/strong&gt; Functions sharing a capacity provider run in containers on the same EC2 instances. This isn't Firecracker isolation. Separate untrusted workloads into separate capacity providers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Powertools minimum version matters.&lt;/strong&gt; Lambda Managed Instances requires Powertools for AWS Lambda (Python) version 3.23.0 or later. Pin the layer version in Terraform rather than using latest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LMI doesn't scale to zero.&lt;/strong&gt; Unlike standard Lambda where you pay nothing at zero traffic, LMI keeps a baseline of warm EC2 instances running for high availability. &lt;strong&gt;AWS launches a baseline of three managed instances for availability across AZs&lt;/strong&gt; when you publish a function version with a capacity provider. In my testing with 2 AZs configured, 2 instances remained active overnight with zero traffic, but the documented baseline is three. There's no minimum instance setting, no Karpenter-style consolidation, and no way to force scale-to-zero short of deleting the function version or capacity provider. This is a meaningful cost difference for dev/test environments where you might leave infrastructure running between sessions. Run &lt;code&gt;make destroy&lt;/code&gt; when you're not actively using the infrastructure, or design your dev environments to use standard Lambda where idle cost is zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quotas to plan around.&lt;/strong&gt; LMI has its own &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html" rel="noopener noreferrer"&gt;service quotas&lt;/a&gt;: 1 request per second on capacity provider write APIs (Create/Update/Delete - rate-limited to prevent infrastructure churn), 100 function versions per capacity provider, and 1,000 capacity providers per account per region. These are soft limits but worth knowing when you start automating capacity provider management or running multiple environments.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  SAM Support
&lt;/h2&gt;

&lt;p&gt;If you came in from the AWS Serverless plugin angle and are wondering whether SAM supports LMI - yes, it does. AWS::Serverless::CapacityProvider is the SAM resource equivalent to &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;. The SAM template syntax is more concise but follows the same model: capacity provider definition, function with &lt;code&gt;CapacityProviderConfig&lt;/code&gt; property, and IAM roles. I chose Terraform for this project because the LMI Terraform path is less documented in the wild and I wanted to fill that gap, but SAM is a perfectly valid choice if your team already uses it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Instance Type Selection
&lt;/h2&gt;

&lt;p&gt;The capacity provider's &lt;code&gt;instance_requirements&lt;/code&gt; block controls which EC2 instance types Lambda selects. By default, Lambda chooses the best fit automatically. You can constrain this with &lt;code&gt;allowed_instance_types&lt;/code&gt; or &lt;code&gt;excluded_instance_types&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Today, the interesting choice is between &lt;code&gt;arm64&lt;/code&gt; (Graviton4 - better price/performance for most workloads) and &lt;code&gt;x86_64&lt;/code&gt;. But the architecture of Lambda Managed Instances - your function code running in containers on EC2 instances you specify - means the compute capabilities available to your functions expand with every new EC2 instance type AWS makes available for LMI.&lt;/p&gt;

&lt;p&gt;The product similarity engine in this project calls Bedrock for query embeddings (I/O-bound) and then computes cosine similarity on CPU (compute-bound). The handler code isn't coupled to a specific compute architecture. The embedding call is behind a clean interface (&lt;code&gt;_embed_query&lt;/code&gt;). The similarity computation is pure math. The instance type is a configuration parameter, not an application concern.&lt;/p&gt;

&lt;p&gt;This is the practical difference between Lambda Managed Instances and standard Lambda. Standard Lambda abstracts the hardware entirely - you get what AWS gives you. Lambda Managed Instances lets you choose, and that choice extends to whatever EC2 instance types AWS makes available.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Lambda Managed Instances fills the gap between standard Lambda and ECS Fargate. The handler function and event-driven invocation pattern stay the same, but you gain EC2 hardware selection, multi-concurrency, configurable memory-to-vCPU ratios, and commitment-based pricing.&lt;/p&gt;

&lt;p&gt;The key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use it for sustained, predictable throughput&lt;/strong&gt; where EC2 pricing beats per-GB-second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose your memory-to-vCPU ratio&lt;/strong&gt; based on whether your workload is compute-bound or memory-bound&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understand the process model&lt;/strong&gt; for your language - Python uses processes (simple, no shared-memory concerns), Java uses OS threads (requires thread-safe code), Node.js uses worker threads with async dispatch, .NET uses Tasks, and Rust uses Tokio async tasks (handlers must be &lt;code&gt;Clone + Send&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor &lt;code&gt;MemoryUtilization&lt;/code&gt;&lt;/strong&gt; because process memory multiplies with concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full Terraform configuration, Python handler, seed script, and Makefile are in the &lt;a href="https://github.com/RDarrylR/lambda-managed-instances-similarity-engine" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" rel="noopener noreferrer"&gt;Lambda Managed Instances Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-python-runtime.html" rel="noopener noreferrer"&gt;Lambda Managed Instances - Python Runtime Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-32-gb-memory-16-vcpus/" rel="noopener noreferrer"&gt;32 GB Memory / 16 vCPU Announcement (March 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-capacity-providers.html" rel="noopener noreferrer"&gt;Capacity Provider Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/nova-embeddings.html" rel="noopener noreferrer"&gt;Amazon Nova Multimodal Embeddings&lt;/a&gt; - Embedding model used in this project&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_capacity_provider" rel="noopener noreferrer"&gt;Terraform aws_lambda_capacity_provider&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda Best Practices&lt;/a&gt; - Observability patterns used in this project&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;Elastic Container Service - My Default Choice for Containers on AWS&lt;/a&gt; - ECS Fargate and Express Mode comparison&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; - Step Functions with Lambda and Fargate&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>lambda</category>
    </item>
    <item>
      <title>Building AI Agents with Spring AI and Amazon Bedrock AgentCore - Part 5 Deploy MCP client for Conference application on AgentCore Runtime</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Tue, 26 May 2026 14:40:49 +0000</pubDate>
      <link>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-5-deploy-mcp-client-for-1n11</link>
      <guid>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-5-deploy-mcp-client-for-1n11</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;, we explained how to deploy and run our conference search application on the Amazon Bedrock AgentCore Runtime as the MCP server. In this article, we'll develop the (MCP-) client, capable of talking to our application running on AgentCore Runtime. Later, in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-3-develop-local-mcp-client-560a"&gt;part 3&lt;/a&gt;, we developed the (MCP-) client, capable of talking to our application running on AgentCore Runtime. In &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-4-provide-mcp-tools-for-2odf"&gt;part 4&lt;/a&gt;, we looked at how to provide the MCP Tools for the Conference application via AgentCore Gateway in a centralized way.&lt;/p&gt;

&lt;p&gt;As we saw in previous articles, the local MCP client for the Conference application, to talk to AgentCore Runtime or Gateway, became quite big. If we have many customers using such a client, changing and operating it can become quite challenging. That's why, in this article, we look at how to deploy and run our MCP client on AgentCore Runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implement the MCP client for the Conference application to be deployable on AgentCore Runtime
&lt;/h2&gt;

&lt;p&gt;We'll reuse the MCP client based on Spring AI that we implemented in parts 3 and 4. But as we need to make some small changes to deploy it on AgentCore Runtime, I created a new &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/tree/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime" rel="noopener noreferrer"&gt;spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime&lt;/a&gt;. It consists of the agent and Infrastructure as Code subfolders.&lt;/p&gt;

&lt;p&gt;Let's first look at the changes that we need to make to the client.  AgentCore Runtime also supports the HTTP protocol contract, which we'll use to deploy our MCP client and talk to it. This contract puts some requirements on the client:&lt;/p&gt;

&lt;p&gt;Container requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Host : 0.0.0.0&lt;/li&gt;
&lt;li&gt;Port : 8080 - Standard port for HTTP-based agent communication &lt;/li&gt;
&lt;li&gt;Platform : ARM64 Docker container - Required for compatibility with the AgentCore Runtime environment. I usually borrow t4g small EC2 instance on AWS to build it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Path requirements: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;/invocations endpoint: POST endpoint for agent interactions&lt;/li&gt;
&lt;li&gt;/ping endpoint: GET endpoint for health checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can read more about this topic in the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-http-protocol-contract.html" rel="noopener noreferrer"&gt;HTTP protocol contract&lt;/a&gt; article.&lt;/p&gt;

&lt;p&gt;The only changes we need to make to our &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/agent/src/main/java/dev/vkazulkin/agent/controller/SpringAIAgentController.java" rel="noopener noreferrer"&gt;REST Controller &lt;/a&gt; are to implement these path requirements. If we use asynchronous communication, the entry point looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/invocations"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"*/*"&lt;/span&gt; &lt;span class="o"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Flux&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;invocations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getAuthTokenViaHttpClient&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;async&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getMcpClientTransport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
   &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;toolCallbacks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;concatWithStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt;
 &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;ToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeTools&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For synchronous communication, the entry point looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/invocations"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"*/*"&lt;/span&gt; &lt;span class="o"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;invocations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getAuthTokenViaHttpClient&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;async&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getMcpClientTransport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
   &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;toolCallbacks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;concatWithStream&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt;
 &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;ToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeTools&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For adding the path to &lt;em&gt;/ping&lt;/em&gt;, we have different options. We can either add such a simple method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@GetMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/ping"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"{\"status\": \"healthy\"}"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use &lt;a href="https://spring.io/guides/gs/actuator-service" rel="noopener noreferrer"&gt;Spring Boot Actuator service&lt;/a&gt; and add some properties to the application.properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;management.endpoints.web.exposure.include&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;health&lt;/span&gt;
&lt;span class="py"&gt;management.endpoints.web.base-path&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;
&lt;span class="py"&gt;management.endpoints.web.path-mapping.health&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;ping&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we need to deploy our MCP client as an ARM64 Docker container, I also added a simple &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/agent/Dockerfile" rel="noopener noreferrer"&gt;Docker file&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; amazoncorretto:25&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; target/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime-0.0.1-SNAPSHOT.jar app.jar&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["java","-jar","/app.jar"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's build the Docker file and upload it to the &lt;a href="https://aws.amazon.com/ecr/" rel="noopener noreferrer"&gt;Amazon Elastic Container Registry&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# build the application&lt;/span&gt;
mvn clean package 

&lt;span class="c"&gt;# build the Docker image&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;docker build &lt;span class="nt"&gt;--no-cache&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; spring-ai-conference-app-agent-bedrock-agentcore-runtime:v1 

&lt;span class="c"&gt;# Login to ECR&lt;/span&gt;
aws ecr get-login-password &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;region&lt;span class="o"&gt;}&lt;/span&gt; | &lt;span class="nb"&gt;sudo &lt;/span&gt;docker login &lt;span class="nt"&gt;--username&lt;/span&gt; AWS &lt;span class="nt"&gt;--password-stdin&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;account_id&lt;span class="o"&gt;}&lt;/span&gt;.dkr.ecr.&lt;span class="o"&gt;{&lt;/span&gt;region&lt;span class="o"&gt;}&lt;/span&gt;.amazonaws.com  

&lt;span class="c"&gt;# Create ECR repository (if it doesn't exist)&lt;/span&gt;
aws ecr create-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; spring-ai-conference-app-agent-bedrock-agentcore-runtime &lt;span class="nt"&gt;--image-scanning-configuration&lt;/span&gt; &lt;span class="nv"&gt;scanOnPush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;region&lt;span class="o"&gt;}&lt;/span&gt;  

&lt;span class="c"&gt;# Tag the Docker image&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;docker tag spring-ai-conference-app-agent-bedrock-agentcore-runtime:v1 &lt;span class="o"&gt;{&lt;/span&gt;account_id&lt;span class="o"&gt;}&lt;/span&gt;.dkr.ecr.&lt;span class="o"&gt;{&lt;/span&gt;region&lt;span class="o"&gt;}&lt;/span&gt;.amazonaws.com/spring-ai-conference-app-agent-bedrock-agentcore-runtime:v1

&lt;span class="c"&gt;# Push the Docker Image to the ECR repository&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;docker push &lt;span class="o"&gt;{&lt;/span&gt;account_id&lt;span class="o"&gt;}&lt;/span&gt;.dkr.ecr.&lt;span class="o"&gt;{&lt;/span&gt;region&lt;span class="o"&gt;}&lt;/span&gt;.amazonaws.com/spring-ai-conference-app-agent-bedrock-agentcore-runtime:v1 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please replace AWS {account_id} and {region} with our own values. Also, your version may not be &lt;em&gt;v1&lt;/em&gt; but a different one.&lt;/p&gt;

&lt;p&gt;We can also build the Docker image by using Buildpack support built into Spring instead of a Dockerfile. Just use the Maven task &lt;a href="https://docs.spring.io/spring-boot/maven-plugin/build-image.html" rel="noopener noreferrer"&gt;spring-boot:build-image&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We don't need to make any other changes on the MCP client itself. &lt;/p&gt;

&lt;p&gt;Let's now cover the IaC part with CDK for Java, which I implemented in &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/cdk/src/main/java/dev/vkazulkin/agentcore/runtime/RuntimeWithMCPStack.java" rel="noopener noreferrer"&gt;RuntimeWithMCPStack&lt;/a&gt; stack. We've already covered many steps in creating the CDK App and Stack, and even the AgentCore Runtime with the MCP protocol, in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;. For a more detailed explanation, I refer to this article.&lt;/p&gt;

&lt;p&gt;First, let's take a look at the creation of the AgentCore Runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt; &lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"MCPRuntime-125"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;runtimeName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;appName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"-"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="o"&gt;)+&lt;/span&gt; &lt;span class="s"&gt;"_runtime"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;protocolConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ProtocolType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;HTTP&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AgenCore Runtime with MCP protocol for running conference app"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;...&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we set some common properties, such as the runtime name, description, and protocol (in our case, HTTP).&lt;/p&gt;

&lt;p&gt;Now let's look at the relevant code parts to assign this code artifact to the AgentCore Runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ecrImageURI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
    &lt;span class="nf"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ecrImageURIForConferenceSearchAndApplicationAgent"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;            

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;agentRuntimeArtifact&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;    
    &lt;span class="nc"&gt;AgentRuntimeArtifact&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromImageUri&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ecrImageURI&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="o"&gt;....&lt;/span&gt;

&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"MCPRuntime-125"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;agentRuntimeArtifact&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agentRuntimeArtifact&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First, we get the value of the variable &lt;em&gt;ecrImageURI&lt;/em&gt;, which points to the imageURI in the ECR we pushed previously.  This is typically done in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/cdk/cdk.json" rel="noopener noreferrer"&gt;cdk.json&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mvn -e -q compile exec:java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="err"&gt;ecrImageUR&lt;/span&gt;&lt;span class="s2"&gt;": "&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;.dkr.ecr.us-east&lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="err"&gt;.amazonaws.com/spring-ai-conference-app-agent-bedrock-agentcore-runtime:v&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="s2"&gt;"
 }
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please adjust the value so that it matches your imageURI. We use the placeholder {AWS_ACCOUNT_ID} there. The reason for it is that I don't want to expose the AWS account ID publicly. That's why I wrote the following utility method &lt;em&gt;getContextVariableValueWithReplacedAccountId&lt;/em&gt; in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/cdk/src/main/java/dev/vkazulkin/ConventionalDefaults.java" rel="noopener noreferrer"&gt;ConventionalDefaults&lt;/a&gt; class to replace the placeholder with the real value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Stack&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;contextVariableName&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;awsAccountId&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getNode&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tryGetContext&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"awsAccountId"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;awsAccountId&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;awsAccountId&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;trim&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isEmpty&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
      &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"please provide your aws account id as as content to the call, for example: cdk deploy -c awsAccountId=1234567890101"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;contextVariableValue&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getContextVariableValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contextVariableName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;replaceAWSAccountID&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contextVariableValue&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;awsAccountId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getContextVariableValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Stack&lt;/span&gt; &lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;contextVariableName&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="n"&gt;stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getNode&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tryGetContext&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contextVariableName&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
 &lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;replaceAWSAccountID&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;configParam&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;awsAccountId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;configParam&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{AWS_ACCOUNT_ID}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;awsAccountId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we create &lt;em&gt;AgentRuntimeArtifact&lt;/em&gt; from the image URI and set it as AgentCore Runtime &lt;em&gt;agentRuntimeArtifact&lt;/em&gt; property.&lt;/p&gt;

&lt;p&gt;Now let's cover the next part - defining the IAM execution role. It's very difficult to automate this part as it takes plenty of time. If I find it, I'll provide the IaC part in the future :). I refer you to the article &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-permissions.html" rel="noopener noreferrer"&gt;IAM Permissions for AgentCore Runtime&lt;/a&gt; for more information. You can also read my article &lt;a href="https://dev.to/aws-heroes/amazon-bedrock-agentcore-runtime-part-2-deploy-the-agent-with-the-agentcore-runtime-starter-3706"&gt;Amazon Bedrock AgentCore Runtime - Part 2 Using Bedrock AgentCore Runtime Starter Toolkit with Strands Agents SDK&lt;/a&gt;, where I explained this part. In that article, we developed the agent in Python with the Strands Agents framework and deployed it on AgentCore Runtime.&lt;/p&gt;

&lt;p&gt;Once we have defined the IAM role, we need to configure it in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/cdk/cdk.json" rel="noopener noreferrer"&gt;cdk.json&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mvn -e -q compile exec:java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"roleArnForTheAgentCoreRuntime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::{AWS_ACCOUNT_ID}:role/service-role/spring-ai-conference-search-application-agentcore-runtime-role"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use the placeholder for the AWS account ID as explained above.  Here is the relevant code to grab the value of the &lt;em&gt;roleArnForTheAgentCoreRuntime&lt;/em&gt; variable and set it to the execution role of the Runtime from the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/cdk/src/main/java/dev/vkazulkin/agentcore/runtime/RuntimeWithMCPStack.java" rel="noopener noreferrer"&gt;RuntimeWithMCPStack&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;roleArnForTheAgentCoreRuntime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"roleArnForTheAgentCoreRuntime"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromRoleArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"roleArnForTheAgentCoreRuntimeRole"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;roleArnForTheAgentCoreRuntime&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"MCPRuntime-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;runtimeName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;appName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"-"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="o"&gt;)+&lt;/span&gt; &lt;span class="s"&gt;"_runtime"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
   &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;executionRole&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;authorizerConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RuntimeAuthorizerConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;usingIAM&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we also use an &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-oauth.html" rel="noopener noreferrer"&gt;IAM authorizer&lt;/a&gt; for the inbound AgentCore Runtime authentication. This is the default authentication and authorization mechanism that works automatically without additional configuration. You can also use JSON Web Tokens (JWT) as we showed in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now we are ready to deploy our MCP client on the AgentCore Runtime.  The command to do it is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cdk deploy -c awsAccountId={YOUR_AWS_ACCOUINT_ID}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here is how the AgentCore Runtime looks in the console after its creation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoi9zh0ojkkdkhacs2v3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxoi9zh0ojkkdkhacs2v3.png" alt=" " width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jf34i7p9o95skgqo67r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jf34i7p9o95skgqo67r.png" alt=" " width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'll need the Runtime ARN, which we see in the output of this command.  Or we can grab it in the service console.  &lt;/p&gt;

&lt;p&gt;Now we still need to write a client that communicates with our MCP client on the Runtime. I provided such an &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-bedrock-agentcore-runtime/agent/src/main/java/dev/vkazulkin/agent/sdk/InvokeRuntimeAgent.java" rel="noopener noreferrer"&gt;InvokeRuntimeAgent&lt;/a&gt; client written in Java, but you can use any programming language for which AWS provides a (bedrockagentcore) SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="no"&gt;AGENT_RUNTIME_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"arn:aws:bedrock-agentcore:us-east-1:{AWS_ACCOUNT_ID}:runtime/spring_ai_conference_search_application_runtime-143wvBghklZ"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

 &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
   &lt;span class="s"&gt;"{\"prompt\":\"Please provide me with the list of conferences, including their IDs, 
with the Java topic happening in 2027, with the call for papers open today. 
Also, provide me with the list of my talks with this topic in the title. 
Finally, for each conference and talk retrieved, apply individually for the conference.\"}"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;httpClient&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ApacheHttpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;connectionTimeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMinutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;socketTimeout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMinutes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;bedrockAgentCoreClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Region&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;US_EAST_1&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;httpClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;httpClient&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;invokeAgentRuntimeRequest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InvokeAgentRuntimeRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;                 
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;agentRuntimeArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;replaceAWSAccountID&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;AGENT_RUNTIME_ARN&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;                               
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;qualifier&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"DEFAULT"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SdkBytes&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromUtf8String&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;responseStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrockAgentCoreClient&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;invokeAgentRuntime&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;invokeAgentRuntimeRequest&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
     &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readAllBytes&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's go step-by-step through it. First of all, we define RUNTIME_ARN, which we deployed in the step before. Please still use the &lt;em&gt;{AWS_ACCOUNT_ID}&lt;/em&gt; placeholder, which will be dynamically replaced with your AWS Account ID. When we create &lt;em&gt;BedrockAgentCoreClient&lt;/em&gt;. We also explicitly set the Apache HTTP client with the extended connection and socket timeouts. Default 30-second timeouts maybe to short for communication with the Runtime. Then we create &lt;em&gt;InvokeAgentRuntimeRequest&lt;/em&gt; and set the agent Runtime ARN, qualifier (always DEFAULT), content type, and payload. The payload is our prompt. You can see the examples of the prompts in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-4-provide-mcp-tools-for-2odf"&gt;part 4&lt;/a&gt; as we're communicating with the same MCP client, but deployed elsewhere. When we invoke the &lt;em&gt;invokeAgentRuntime&lt;/em&gt; method on the &lt;em&gt;bedrockAgentCoreClient&lt;/em&gt; by providing the &lt;em&gt;invokeAgentRuntimeRequest&lt;/em&gt; and convert the agent response to a string. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we looked at how to deploy and run our MCP client on AgentCore Runtime. With that, our MCP client now scales nicely within the Runtime. &lt;/p&gt;

&lt;p&gt;Of course, you can create a nicer client by providing UI for entering the prompt and providing the agent response as a result. My goal was only to demonstrate how to implement such a client. Now we can change and redeploy our MCP client based on Spring AI on the AgentCore Runtime as often as we want. The client code remains unchanged as long as the Runtime ARN remains unchanged.&lt;/p&gt;

&lt;p&gt;Starting from the next article, we'll look at the &lt;a href="https://github.com/spring-ai-community/spring-ai-agentcore" rel="noopener noreferrer"&gt;Spring AI AgentCore&lt;/a&gt; functionality. Spring AI AgentCore SDK is an open-source library that brings Amazon Bedrock AgentCore capabilities into Spring AI through familiar patterns: annotations, auto-configuration, and composable advisors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like my content, please follow me on &lt;a href="https://github.com/Vadym79" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give my repositories a star!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also check out my &lt;a href="https://vkazulkin.com" rel="noopener noreferrer"&gt;website&lt;/a&gt; for more technical content and upcoming public speaking activities.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>springai</category>
      <category>bedrockagentcore</category>
    </item>
    <item>
      <title>Getting Claude Code off my laptop and onto shared compute</title>
      <dc:creator>Danielle Heberling</dc:creator>
      <pubDate>Sat, 23 May 2026 10:12:03 +0000</pubDate>
      <link>https://dev.to/aws-heroes/getting-claude-code-off-my-laptop-and-onto-shared-compute-4cjc</link>
      <guid>https://dev.to/aws-heroes/getting-claude-code-off-my-laptop-and-onto-shared-compute-4cjc</guid>
      <description>&lt;p&gt;Running Claude Code on my own machine was easy. Getting it onto shared compute my whole team could trigger was the hard part. There's plenty written about the local side. A lot less about the team side.&lt;/p&gt;

&lt;p&gt;I made that move because of how a broken deploy plays out for us. I'm the only DevOps engineer on my team. A CloudFormation deploy fails. A Slack notification fires. And more often than not, someone pings me to ask what went wrong.&lt;/p&gt;

&lt;p&gt;I get why. AWS isn't everyone's day to day, and a &lt;code&gt;CREATE_FAILED&lt;/code&gt; event with a rollback behind it isn't the friendliest thing to read. The pings weren't the real problem, though. A broken deploy that hinges on one person doesn't scale.&lt;/p&gt;

&lt;p&gt;So I decided to build my way out of it. I'd give the team a starting point on a broken deploy without pinging me. It wouldn't fix the problem, but it'd tell them what broke and where to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;The result is a tool I'm calling the cfn-investigator. I put a thinned down version on GitHub as &lt;a href="https://github.com/deeheber/headless-claude-on-aws" rel="noopener noreferrer"&gt;headless-claude-on-aws&lt;/a&gt;. It's narrowed to CloudFormation only and meant as a jumping-off point, not a copy of what I run at work. Same idea, rebuilt from scratch. It's close enough that you could follow it, learn from it, or fork it as a base.&lt;/p&gt;

&lt;p&gt;The shape is small. It's a CodeBuild project that runs Claude Code headlessly. You hand it a failing stack name, optionally with the commit you suspect. It reads the stack state through the AWS MCP server with a read only role, works out the likely cause, and writes a short analysis. The example logs it to CloudWatch with a one line spot to forward it anywhere. Mine posts it in the Slack thread where the alert fired, right under the question.&lt;/p&gt;

&lt;p&gt;One design choice worth calling out is how it handles confidence. The system prompt tells it to be honest, including an "unsure" option that ranks hypotheses instead of inventing a clean answer. A ranked shortlist beats a confident wrong guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it looks the way it does
&lt;/h2&gt;

&lt;p&gt;It is not "best practice."&lt;/p&gt;

&lt;p&gt;I picked CodeBuild over Lambda or Fargate, and handed Claude Code an Anthropic API key instead of routing through Bedrock. None were textbook choices. They got me to a working prototype fastest. CodeBuild matched the job. Clone the source, run a script, post the result somewhere. That's what the investigator does. The rest of the reasoning, including why I skipped Bedrock, is in the README.&lt;/p&gt;

&lt;p&gt;If I'm being real, the biggest factor was knowing I'd be the only one responsible for this. So I optimized for two things, shipping something that worked and keeping it boring enough to maintain alone. Fancy was a liability. That's an engineering trade-off, not an accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The imperfect but working part
&lt;/h2&gt;

&lt;p&gt;This is the part I actually care about.&lt;/p&gt;

&lt;p&gt;The repo is a pile of YAML and bash. The IAM is broader than it should be (it uses AWS managed &lt;code&gt;ReadOnlyAccess&lt;/code&gt;, which you'd want to scope down). The tools get installed fresh on every run instead of baked into an image. The two role split scopes the MCP server's AWS calls, not Claude itself.&lt;/p&gt;

&lt;p&gt;And it works. Last week a deploy failed and the investigator flagged a missing environment variable on a Fargate task definition. The developer saw the message, added the variable, and redeployed without pinging anyone. A broken deploy comes with a starting point attached now, so the next move doesn't wait on one person.&lt;/p&gt;

&lt;p&gt;In my opinion we've gotten a little precious about reference architectures. There's a strong pull to wait until you can build the clean, fully managed, perfectly scoped version. But the clean version often doesn't exist yet, or isn't mature, or would take three times as long to ship. Meanwhile the messy version that you actually understand and can keep running yourself is sitting right there, solving the real problem today.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you might want instead
&lt;/h2&gt;

&lt;p&gt;The reason I had to write all that YAML is that the managed options either didn't exist or weren't mature when I started.&lt;/p&gt;

&lt;p&gt;That's changed. Before I write more YAML next time, I want to look at &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/05/claude-platform-aws/" rel="noopener noreferrer"&gt;Claude on AWS&lt;/a&gt;, &lt;a href="https://platform.claude.com/docs/en/managed-agents/overview" rel="noopener noreferrer"&gt;Claude Managed Agents&lt;/a&gt;, and the &lt;a href="https://code.claude.com/docs/en/agent-sdk/overview" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt;. Any of those would let you skip most of the plumbing I built by hand. I haven't used them for real yet, so I can't tell you how they hold up, but they're the first place I'd look now.&lt;/p&gt;

&lt;p&gt;I'm sharing my version for the cases where the managed path isn't a fit, and as a concrete example you can pull apart.&lt;/p&gt;

&lt;h2&gt;
  
  
  Take a look
&lt;/h2&gt;

&lt;p&gt;The code is up at &lt;a href="https://github.com/deeheber/headless-claude-on-aws" rel="noopener noreferrer"&gt;github.com/deeheber/headless-claude-on-aws&lt;/a&gt;. The README walks through deploying it, populating the secrets, and kicking off a run.&lt;/p&gt;

&lt;p&gt;If you build something like this, I'd love to hear how it went. Especially the parts that didn't work.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>claude</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>Amazon Q Developer CLI ahora es Kiro CLI — ¿Qué cambió y por qué importa?</title>
      <dc:creator>Carlos Cortez 🇵🇪 [AWS Hero]</dc:creator>
      <pubDate>Thu, 21 May 2026 01:03:15 +0000</pubDate>
      <link>https://dev.to/aws-heroes/amazon-q-developer-cli-ahora-es-kiro-cli-que-cambio-y-por-que-importa-c76</link>
      <guid>https://dev.to/aws-heroes/amazon-q-developer-cli-ahora-es-kiro-cli-que-cambio-y-por-que-importa-c76</guid>
      <description>&lt;h1&gt;
  
  
  Amazon Q Developer CLI ahora es Kiro CLI — ¿Qué cambió y por qué importa?
&lt;/h1&gt;

&lt;p&gt;Si llevas un tiempo en el ecosistema AWS y usas herramientas de desarrollo con IA, probablemente ya notaste el cambio: &lt;strong&gt;Amazon Q Developer CLI&lt;/strong&gt; ya no existe como tal. Ahora se llama &lt;strong&gt;Kiro CLI&lt;/strong&gt;. Y no, no es solo un rebrand de nombre — es un cambio de filosofía completo.&lt;/p&gt;

&lt;p&gt;Vamos a explorar qué pasó, qué cambió realmente, y por qué creo que esto importa más de lo que parece.&lt;/p&gt;




&lt;h2&gt;
  
  
  Un poco de contexto: ¿qué era Amazon Q Developer CLI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/q/developer/" rel="noopener noreferrer"&gt;Amazon Q Developer&lt;/a&gt; era el asistente de IA de AWS para desarrolladores. Tenía una versión en el IDE (VS Code, JetBrains), una versión en la consola de AWS, y también una CLI que te permitía interactuar con tu entorno desde la terminal usando lenguaje natural.&lt;/p&gt;

&lt;p&gt;La idea era buena: preguntarle a un agente directamente desde tu terminal cosas como &lt;em&gt;"¿qué instancias EC2 tengo corriendo en us-east-1?"&lt;/em&gt; o &lt;em&gt;"genera un script para limpiar buckets S3 sin versioning"&lt;/em&gt;. Útil, pero limitado en su enfoque — era básicamente un chatbot en tu terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Entonces, ¿qué es Kiro?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://kiro.dev" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; es un IDE agéntico construido sobre VS Code, lanzado por AWS. Pero lo que muchos no saben es que también tiene una CLI — &lt;strong&gt;Kiro CLI&lt;/strong&gt; — que reemplaza directamente a Amazon Q Developer CLI.&lt;/p&gt;

&lt;p&gt;Lo interesante es que Kiro no es solo "Q con otro nombre". El cambio refleja una evolución real en cómo AWS piensa las herramientas de desarrollo:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Amazon Q Developer CLI&lt;/th&gt;
&lt;th&gt;Kiro CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Asistente conversacional en terminal&lt;/td&gt;
&lt;td&gt;Agente con contexto del proyecto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Respuestas puntuales&lt;/td&gt;
&lt;td&gt;Spec-driven + MCP-driven + Steering-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Foco en queries rápidas&lt;/td&gt;
&lt;td&gt;Foco en workflows completos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sin memoria de proyecto&lt;/td&gt;
&lt;td&gt;Entiende tu arquitectura y convenciones&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Comando: &lt;code&gt;q chat&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Comando: &lt;code&gt;kiro-cli chat&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;La idea aquí es que Kiro no solo responde preguntas — razona sobre tu proyecto, lee tus steering files, se conecta a herramientas externas vía MCP, y actúa en consecuencia.&lt;/p&gt;




&lt;h2&gt;
  
  
  El cambio más importante: de "asistente" a "agente"
&lt;/h2&gt;

&lt;p&gt;En la práctica esto significa que Kiro CLI opera con capacidades que Amazon Q Developer CLI nunca tuvo de forma nativa:&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Features de Kiro CLI
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interactive Chat&lt;/strong&gt; — Conversaciones en lenguaje natural directamente en tu terminal con &lt;code&gt;kiro-cli chat&lt;/code&gt;. Puede leer y escribir archivos, ejecutar comandos, y razonar sobre tu código.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Custom Agents&lt;/strong&gt; — Puedes crear y desplegar agentes especializados para tus workflows específicos. No estás limitado al agente genérico.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP Integration&lt;/strong&gt; — Conecta herramientas y fuentes de datos externas a través del &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;. Esto es enorme — puedes conectar Kiro CLI a servidores MCP de CloudWatch, MSK, OpenSearch, Okta, y muchos más.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Smart Hooks&lt;/strong&gt; — Automatiza workflows con hooks inteligentes que se ejecutan antes o después de comandos específicos.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent Steering&lt;/strong&gt; — Guía al agente con las mejores prácticas y preferencias de tu equipo usando steering files. Esto es lo que hace que Kiro entienda &lt;em&gt;tu&lt;/em&gt; contexto, no solo el contexto genérico.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Auto Complete&lt;/strong&gt; — Sugerencias inteligentes de comandos con contexto mientras escribes en la terminal.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Antes con Amazon Q Developer CLI&lt;/span&gt;
q chat &lt;span class="s2"&gt;"lista mis funciones Lambda en us-east-1"&lt;/span&gt;

&lt;span class="c"&gt;# Ahora con Kiro CLI — con contexto de proyecto y MCP&lt;/span&gt;
kiro-cli chat
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; revisa las funciones Lambda del proyecto y sugiere optimizaciones basándote en las métricas de CloudWatch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;La diferencia no es solo sintáctica. Kiro sabe qué proyecto es, tiene acceso a tus herramientas vía MCP, y puede actuar sobre eso.&lt;/p&gt;




&lt;h2&gt;
  
  
  ¿Cómo instalar Kiro CLI hoy?
&lt;/h2&gt;

&lt;p&gt;La instalación es directa:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://cli.kiro.dev/install | bash

&lt;span class="c"&gt;# Windows (PowerShell)&lt;/span&gt;
irm &lt;span class="s1"&gt;'https://cli.kiro.dev/install.ps1'&lt;/span&gt; | iex

&lt;span class="c"&gt;# Verificar instalación&lt;/span&gt;
kiro-cli &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Una vez instalado, empezar es así de simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Navega a tu proyecto&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;mi-proyecto

&lt;span class="c"&gt;# Inicia Kiro CLI&lt;/span&gt;
kiro-cli

&lt;span class="c"&gt;# O directamente al chat&lt;/span&gt;
kiro-cli chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Otros comandos útiles
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Traducir lenguaje natural a comandos bash&lt;/span&gt;
kiro-cli translate &lt;span class="s2"&gt;"muestra los últimos 10 logs de mi función Lambda"&lt;/span&gt;

&lt;span class="c"&gt;# Habilitar sugerencias inline (requiere zsh)&lt;/span&gt;
kiro-cli inline &lt;span class="nb"&gt;enable&lt;/span&gt;

&lt;span class="c"&gt;# Deshabilitar sugerencias inline&lt;/span&gt;
kiro-cli inline disable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lo interesante es que &lt;code&gt;kiro-cli translate&lt;/code&gt; convierte tu instrucción en el comando bash correspondiente sin ejecutarlo — tú decides si lo corres o no. Perfecto para aprender comandos complejos de AWS CLI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kiro CLI en CloudShell
&lt;/h2&gt;

&lt;p&gt;Si no quieres instalar nada localmente, Kiro CLI ya viene disponible en &lt;a href="https://docs.aws.amazon.com/cloudshell/latest/userguide/q-cli-features-in-cloudshell.html" rel="noopener noreferrer"&gt;AWS CloudShell&lt;/a&gt;. Solo abre CloudShell y ejecuta &lt;code&gt;kiro-cli&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Las sugerencias inline en CloudShell requieren Z shell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Cambiar a zsh en CloudShell&lt;/span&gt;
zsh

&lt;span class="c"&gt;# Las sugerencias inline se habilitan automáticamente&lt;/span&gt;
&lt;span class="c"&gt;# Para deshabilitarlas:&lt;/span&gt;
kiro-cli inline disable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Agent Steering: dale contexto persistente a Kiro CLI
&lt;/h2&gt;

&lt;p&gt;Esta es una de las features más importantes de Kiro CLI y la que marca la diferencia real con lo que teníamos en Amazon Q Developer CLI. Los &lt;strong&gt;steering files&lt;/strong&gt; son archivos markdown que le dan a Kiro conocimiento persistente sobre tu proyecto, tu stack, y las convenciones de tu equipo.&lt;/p&gt;

&lt;p&gt;La idea aquí es simple: en vez de re-explicar tu proyecto cada vez que abres una sesión, escribes un steering file una vez y Kiro lo lee automáticamente en cada interacción.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Qué poner en un steering file?
&lt;/h3&gt;

&lt;p&gt;Un steering file es un &lt;code&gt;.md&lt;/code&gt; que vive en tu proyecto (típicamente en &lt;code&gt;.kiro/steering/&lt;/code&gt;) y puede contener:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stack tecnológico&lt;/strong&gt; — qué lenguajes, frameworks y servicios usas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Convenciones del equipo&lt;/strong&gt; — naming conventions, patrones de diseño, estructura de carpetas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contexto de infraestructura&lt;/strong&gt; — nombres de instancias, ubicación de logs, usuarios del sistema&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Requisitos de compliance&lt;/strong&gt; — estándares de seguridad, accesibilidad, auditoría&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reglas de negocio&lt;/strong&gt; — lógica específica de tu dominio que el agente debe respetar&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ejemplo real: steering file para un proyecto serverless
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project Context&lt;/span&gt;

&lt;span class="gu"&gt;## Stack&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Runtime: Python 3.12 on AWS Lambda
&lt;span class="p"&gt;-&lt;/span&gt; API: Amazon API Gateway (REST)
&lt;span class="p"&gt;-&lt;/span&gt; Database: Amazon DynamoDB (single-table design)
&lt;span class="p"&gt;-&lt;/span&gt; IaC: AWS CDK (Python)

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All Lambda handlers go in &lt;span class="sb"&gt;`src/handlers/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Business logic goes in &lt;span class="sb"&gt;`src/services/`&lt;/span&gt; — never in handlers
&lt;span class="p"&gt;-&lt;/span&gt; Use structured logging with aws-lambda-powertools
&lt;span class="p"&gt;-&lt;/span&gt; DynamoDB access patterns use PK/SK with GSI1

&lt;span class="gu"&gt;## Security&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; All API endpoints require IAM authorization
&lt;span class="p"&gt;-&lt;/span&gt; No hardcoded credentials — use environment variables from Secrets Manager
&lt;span class="p"&gt;-&lt;/span&gt; Input validation on every handler entry point
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lo que hace esto particularmente poderoso es que Kiro CLI usa este contexto en &lt;strong&gt;cada&lt;/strong&gt; interacción. Si le pides que genere un nuevo endpoint, va a seguir tus convenciones automáticamente — handlers en &lt;code&gt;src/handlers/&lt;/code&gt;, lógica en &lt;code&gt;src/services/&lt;/code&gt;, con powertools y validación de input. Sin que tengas que repetirlo.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steering files en la práctica
&lt;/h3&gt;

&lt;p&gt;El blog de AWS sobre Oracle EBS con Kiro CLI muestra un caso real: usan steering files para darle a Kiro el conocimiento de su entorno Oracle — patrones de nombres de instancias, usuarios del OS, ubicación de logs y scripts. Así, cuando preguntan &lt;em&gt;"¿está sano el concurrent manager?"&lt;/em&gt;, Kiro ya sabe dónde buscar sin que se lo expliquen cada vez.&lt;/p&gt;

&lt;p&gt;Para equipos, esto es oro. Un nuevo developer se une, clona el repo, y Kiro CLI ya tiene todo el contexto del proyecto en los steering files. La curva de onboarding se reduce dramáticamente.&lt;/p&gt;




&lt;h2&gt;
  
  
  Spec-Driven Development: ¿disponible en Kiro CLI?
&lt;/h2&gt;

&lt;p&gt;Acá hay que ser claros porque es una pregunta que muchos se hacen. &lt;strong&gt;Spec-Driven Development es una feature del Kiro IDE, no de Kiro CLI.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Según la documentación oficial de Kiro, las Specs viven bajo la sección de documentación del IDE (&lt;code&gt;/docs/specs/&lt;/code&gt;), y no aparecen en la sidebar de la CLI. La CLI tiene: Chat, Custom Agents, MCP, Hooks, Steering, Autocomplete, y Headless — pero no Specs.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Qué es Spec-Driven Development?
&lt;/h3&gt;

&lt;p&gt;Para los que no lo conocen, es el workflow estrella de Kiro IDE. Funciona así:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Le describes tu idea&lt;/strong&gt; al agente en lenguaje natural&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro genera requirements&lt;/strong&gt; estructurados (en formato EARS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro crea un design document&lt;/strong&gt; con arquitectura, modelos de datos, APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro produce un plan de implementación&lt;/strong&gt; con tareas concretas y ordenadas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro ejecuta cada tarea&lt;/strong&gt; — escribe código, tests, documentación&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cada spec genera tres archivos clave: &lt;code&gt;requirements.md&lt;/code&gt;, &lt;code&gt;design.md&lt;/code&gt;, y &lt;code&gt;tasks.md&lt;/code&gt;. Hay dos tipos de specs: &lt;strong&gt;Feature Specs&lt;/strong&gt; (para funcionalidades nuevas) y &lt;strong&gt;Bugfix Specs&lt;/strong&gt; (para diagnosticar y corregir bugs).&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Por qué no está en la CLI?
&lt;/h3&gt;

&lt;p&gt;Mi lectura es que Spec-Driven Development requiere una experiencia visual que la terminal no puede ofrecer fácilmente — la navegación entre archivos de spec, la vista de progreso de tareas, y la interacción con el design document son inherentemente visuales. La CLI está optimizada para workflows más directos: chat, automatización, MCP, y headless.&lt;/p&gt;

&lt;h3&gt;
  
  
  ¿Qué usar entonces?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Necesidad&lt;/th&gt;
&lt;th&gt;Herramienta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Desarrollar features completas con specs&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kiro IDE&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat agéntico desde la terminal&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kiro CLI&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automatización y CI/CD&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Kiro CLI&lt;/strong&gt; (headless)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conectar herramientas externas vía MCP&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Kiro CLI&lt;/strong&gt; o &lt;strong&gt;Kiro IDE&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steering files para contexto de equipo&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Kiro CLI&lt;/strong&gt; o &lt;strong&gt;Kiro IDE&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Troubleshooting rápido de AWS&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Kiro CLI&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mi recomendación: usa ambos. Kiro IDE para desarrollo de features con specs, y Kiro CLI para todo lo que haces en la terminal — troubleshooting, automatización, operaciones, y CI/CD. Los steering files funcionan en ambos, así que tu contexto de proyecto se comparte.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP: lo que hace a Kiro CLI realmente poderoso
&lt;/h2&gt;

&lt;p&gt;El Model Context Protocol es un estándar abierto que permite a agentes de IA conectarse de forma segura con herramientas externas, fuentes de datos y servicios. En la práctica esto significa que puedes extender las capacidades de Kiro CLI conectándolo a servidores MCP especializados.&lt;/p&gt;

&lt;p&gt;Algunos ejemplos reales que ya existen:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Servidor MCP&lt;/th&gt;
&lt;th&gt;Qué hace&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CloudWatch MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Consulta métricas, logs y alarmas con lenguaje natural&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon MSK MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Administra clusters de Kafka — topics, configuraciones, health&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Diagram MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Genera diagramas de arquitectura AWS desde prompts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenSearch MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Busca índices, inspecciona estado del cluster, diagnósticos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Okta MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gestión de identidades — usuarios, grupos, permisos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Documentation MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Busca y lee documentación de AWS en contexto&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;La configuración de un servidor MCP se hace en &lt;code&gt;~/.kiro/settings/mcp.json&lt;/code&gt;. Una vez configurado, Kiro CLI tiene acceso a las herramientas del servidor y las usa automáticamente cuando son relevantes para tu pregunta.&lt;/p&gt;

&lt;p&gt;Lo que hace esto particularmente poderoso es que puedes combinar múltiples servidores MCP. Imagina preguntarle a Kiro CLI: &lt;em&gt;"¿por qué mi aplicación está lenta?"&lt;/em&gt; y que automáticamente consulte CloudWatch para métricas, OpenSearch para logs, y te dé un diagnóstico completo — todo desde tu terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Los métodos de login — esto es clave
&lt;/h2&gt;

&lt;p&gt;Acá es donde la cosa se pone interesante de verdad, porque Kiro CLI hereda el modelo de autenticación de Amazon Q Developer pero con matices que importan. Hay cuatro formas de conectarte, y cada una te da acceso a cosas diferentes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Builder ID (Free) — para empezar rápido
&lt;/h3&gt;

&lt;p&gt;Es la forma más simple. Te creas un &lt;a href="https://docs.aws.amazon.com/signin/latest/userguide/create-aws_builder_id.html" rel="noopener noreferrer"&gt;AWS Builder ID&lt;/a&gt; gratis (con tu email, Google, Apple, GitHub o Amazon) y listo, ya puedes usar Kiro CLI y el IDE.&lt;/p&gt;

&lt;p&gt;La limitación: tienes límites mensuales de uso y solo funciona en el IDE y la CLI. No tienes acceso a la consola de AWS ni a features avanzados. Pero para proyectos personales y exploración, es más que suficiente.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Builder ID + Pro — más límites, tu propia cuenta AWS
&lt;/h3&gt;

&lt;p&gt;Acá es donde empieza a ponerse bueno. Puedes hacer upgrade de tu Builder ID al tier Pro conectándolo a tu propia cuenta de AWS. Esto te da límites de uso mucho más altos.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inicia chat con Kiro CLI&lt;/span&gt;
kiro-cli chat

&lt;span class="c"&gt;# Dentro del chat, escribe:&lt;/span&gt;
/subscribe
&lt;span class="c"&gt;# Esto abre la consola de AWS para confirmar la suscripción Pro&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El punto clave es que con Builder ID Pro tienes límites más altos, pero no todas las features Pro. Algunas features avanzadas solo están disponibles vía IAM Identity Center.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. IAM Identity Center (Free) — para equipos y organizaciones
&lt;/h3&gt;

&lt;p&gt;Si tu empresa ya usa &lt;a href="https://aws.amazon.com/iam/identity-center/" rel="noopener noreferrer"&gt;IAM Identity Center&lt;/a&gt; (antes AWS SSO), puedes autenticarte con tu identidad corporativa. Esto te da acceso a la consola de AWS, apps y websites de AWS — algo que Builder ID no puede hacer.&lt;/p&gt;

&lt;p&gt;Ideal si tu admin ya configuró el Identity Center en la organización.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. IAM Identity Center + Pro — el combo completo 🔥
&lt;/h3&gt;

&lt;p&gt;Y acá está el gold standard. Tu admin te suscribe a Amazon Q Developer Pro vía IAM Identity Center, y tienes acceso a &lt;strong&gt;todo&lt;/strong&gt;: CLI, IDE, consola, apps de AWS, features avanzados, límites altos, y control empresarial.&lt;/p&gt;

&lt;p&gt;Lo que hace esto particularmente poderoso es que tu empresa tiene control total: puede suscribir usuarios en bulk, trackear uso, cancelar suscripciones, y tú como developer tienes la experiencia completa en todos los canales.&lt;/p&gt;

&lt;h3&gt;
  
  
  La guía rápida de acceso
&lt;/h3&gt;

&lt;h4&gt;
  
  
  🆓 Builder ID — Free
&lt;/h4&gt;

&lt;p&gt;✅ CLI · ✅ IDE · ❌ Consola AWS · ❌ Features Pro&lt;br&gt;
→ Para empezar rápido con proyectos personales&lt;/p&gt;
&lt;h4&gt;
  
  
  💰 Builder ID — Pro
&lt;/h4&gt;

&lt;p&gt;✅ CLI · ✅ IDE · ❌ Consola AWS · ⚠️ Features Pro parciales&lt;br&gt;
→ Límites más altos, pero no el suite completo&lt;/p&gt;
&lt;h4&gt;
  
  
  🏢 IAM Identity Center — Free
&lt;/h4&gt;

&lt;p&gt;❌ CLI · ❌ IDE · ✅ Consola AWS · ❌ Features Pro&lt;br&gt;
→ Solo consola, ideal si tu admin aún no activó Pro&lt;/p&gt;
&lt;h4&gt;
  
  
  🔥 IAM Identity Center — Pro
&lt;/h4&gt;

&lt;p&gt;✅ CLI · ✅ IDE · ✅ Consola AWS · ✅ Features Pro completos&lt;br&gt;
→ La experiencia completa — el gold standard&lt;/p&gt;

&lt;p&gt;Mi recomendación sincera: si eres developer individual, empieza con Builder ID Free y cuando sientas los límites, haz upgrade a Pro con tu cuenta AWS. Si estás en una empresa, pídele a tu admin que configure IAM Identity Center con Pro — es la experiencia más completa y además puedes usarlo desde Kiro IDE con toda la potencia.&lt;/p&gt;


&lt;h2&gt;
  
  
  Headless Mode: Kiro CLI en CI/CD
&lt;/h2&gt;

&lt;p&gt;Una capacidad nueva que no existía en Q Developer CLI es el &lt;strong&gt;modo headless&lt;/strong&gt; — puedes ejecutar prompts de forma no interactiva usando API keys. Esto abre la puerta a integrar Kiro CLI en pipelines de CI/CD.&lt;/p&gt;

&lt;p&gt;En la práctica esto significa que puedes automatizar tareas como:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Revisión de código automatizada en PRs&lt;/li&gt;
&lt;li&gt;Generación de documentación en cada merge&lt;/li&gt;
&lt;li&gt;Análisis de seguridad como paso del pipeline&lt;/li&gt;
&lt;li&gt;Generación de tests para código nuevo&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  ¿Debería migrar ya?
&lt;/h2&gt;

&lt;p&gt;Sí, y sin dudarlo. Amazon Q Developer CLI ya no recibe actualizaciones activas. Todo el desarrollo está en Kiro CLI.&lt;/p&gt;

&lt;p&gt;Pero más allá de la migración técnica, lo que me parece más valioso es el cambio de mentalidad que propone Kiro: dejar de usar la IA como un buscador glorificado y empezar a usarla como un agente que entiende tu contexto, se conecta a tus herramientas, y trabaja contigo en workflows reales.&lt;/p&gt;

&lt;p&gt;La migración en sí es simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Instalar Kiro CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://cli.kiro.dev/install | bash

&lt;span class="c"&gt;# 2. Donde antes usabas 'q chat', ahora usas:&lt;/span&gt;
kiro-cli chat

&lt;span class="c"&gt;# 3. Donde usabas 'q translate', ahora:&lt;/span&gt;
kiro-cli translate &lt;span class="s2"&gt;"tu instrucción aquí"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  El takeaway principal
&lt;/h2&gt;

&lt;p&gt;El cambio de Amazon Q Developer CLI a Kiro CLI no es cosmético. Es una señal clara de hacia dónde va AWS con sus herramientas de desarrollo: &lt;strong&gt;agentes con contexto persistente, conectados a tus herramientas vía MCP, que entienden las convenciones de tu equipo vía steering files, y que pueden actuar — no solo responder.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Las capacidades clave que ganamos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kiro-cli chat&lt;/code&gt;&lt;/strong&gt; — Chat agéntico con capacidad de leer/escribir archivos y ejecutar comandos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kiro-cli translate&lt;/code&gt;&lt;/strong&gt; — Traduce lenguaje natural a bash&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;kiro-cli inline&lt;/code&gt;&lt;/strong&gt; — Sugerencias inteligentes mientras escribes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Integration&lt;/strong&gt; — Conecta herramientas externas (CloudWatch, MSK, OpenSearch, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Steering&lt;/strong&gt; — Dale contexto persistente a Kiro con las prácticas y convenciones de tu equipo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Agents&lt;/strong&gt; — Crea agentes especializados para tus workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Hooks&lt;/strong&gt; — Automatización pre/post comandos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headless Mode&lt;/strong&gt; — Integración con CI/CD vía API keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Y un punto importante: &lt;strong&gt;Spec-Driven Development es exclusivo del Kiro IDE&lt;/strong&gt;. Si quieres el workflow completo de specs (requirements → design → tasks → implementación), necesitas el IDE. La CLI es para chat, automatización, MCP, y operaciones desde la terminal. Ambos comparten steering files, así que tu contexto de proyecto funciona en los dos.&lt;/p&gt;

&lt;p&gt;Mi recomendación: instala Kiro CLI, crea un steering file para tu proyecto más activo, conecta un servidor MCP relevante para tu stack, y experimenta. La curva de aprendizaje es corta y el salto de productividad es real.&lt;/p&gt;




&lt;p&gt;Yo soy Carlos Cortez y esto es &lt;em&gt;Breaking the Cloud&lt;/em&gt; — nos vemos pronto.&lt;/p&gt;

&lt;p&gt;Sígueme en:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;a href="https://www.linkedin.com/in/carloscortezcloud" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🐦 &lt;a href="https://x.com/ccortezb" rel="noopener noreferrer"&gt;X / Twitter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;a href="https://github.com/ccortezb" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 &lt;a href="https://dev.to/ccortezb"&gt;Dev.to&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🦸 &lt;a href="https://builder.aws.com/community/@breakinthecloud" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;✍️ &lt;a href="https://ccortezb.medium.com" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>cli</category>
      <category>news</category>
      <category>spanish</category>
    </item>
    <item>
      <title>Strands Agents + AgentCore Runtime - a perfect match</title>
      <dc:creator>Matt Lewis</dc:creator>
      <pubDate>Wed, 20 May 2026 21:17:54 +0000</pubDate>
      <link>https://dev.to/aws-heroes/strands-agents-agentcore-runtime-a-perfect-match-3a51</link>
      <guid>https://dev.to/aws-heroes/strands-agents-agentcore-runtime-a-perfect-match-3a51</guid>
      <description>&lt;p&gt;This is the third in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on &lt;code&gt;Amazon Bedrock AgentCore Runtime&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p"&gt;Building a Full-Stack AI Agent on Bedrock AgentCore&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://dev.to/aws-heroes/data-ingestion-rss-feeds-knowledge-base-s3-vectors-and-metadata-filtering-4n8m"&gt;Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 3: Strands Agents + AgentCore Runtime - a perfect match&lt;/li&gt;
&lt;li&gt;Part 4: Adding Memory to the Agent&lt;/li&gt;
&lt;li&gt;Part 5: Experimenting with API Gateway&lt;/li&gt;
&lt;li&gt;Part 6: Observability and Evaluations&lt;/li&gt;
&lt;li&gt;Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The initial implementation of the AWS Briefing Agent called the &lt;code&gt;AWS News Feed&lt;/code&gt; RSS feed on every invocation. After setting up an &lt;code&gt;Amazon Bedrock Knowledge Base&lt;/code&gt;, the next step was to refactor the code to take advantage of an agentic framework. The decision was made to adopt &lt;code&gt;Strands Agents&lt;/code&gt; SDK as an open source SDK that helps you build and run AI agents in just a few lines of code. In our case, switching to the Knowledge Base and adopting &lt;code&gt;Strands Agents&lt;/code&gt; SDK helped us to reduce the number of lines of code in our implementation logic by 75%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Strands Agents SDK
&lt;/h2&gt;

&lt;p&gt;The core of the &lt;code&gt;Strands Agents&lt;/code&gt; code is straightforward and shown in the code snippet below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.agent.conversation_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SlidingWindowConversationManager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;retrieve&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent.tools.slack_formatter.tool&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;format_slack_message&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;guardrail_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GUARDRAIL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guardrail_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GUARDRAIL_VERSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guardrail_trace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enabled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_load_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format_slack_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;gateway_tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SlidingWindowConversationManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;should_truncate_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;per_turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;callback_handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We start by importing a number of classes and functions from two packages (&lt;code&gt;strands-agents&lt;/code&gt; and &lt;code&gt;strands-agents-tools&lt;/code&gt;) and one local module. &lt;code&gt;Agent&lt;/code&gt; is the core class for the agent itself, &lt;code&gt;BedrockModel&lt;/code&gt; is the model provider, &lt;code&gt;SlidingWindowConversationManager&lt;/code&gt; controls how conversation history is trimmed, and &lt;code&gt;retrieve&lt;/code&gt; is a pre-built tool that is used to query a Bedrock Knowledge Base. The &lt;code&gt;format_slack_message&lt;/code&gt; is a local custom tool within this project - a Python function decorated with the &lt;code&gt;@tool&lt;/code&gt; annotation.&lt;/p&gt;

&lt;p&gt;We instantiate the &lt;code&gt;BedrockModel()&lt;/code&gt; without specifying a model_id. At this point, Strands uses its default model, which is current Claude Sonnet on Bedrock. We include details of a Bedrock Guardrail when we instantiate the model, purely to demonstrate the use of guardrails which we cover this later in the blog post.&lt;/p&gt;

&lt;p&gt;Finally, we create the agent by wiring together its core components.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy to Amazon Bedrock AgentCore Runtime
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;AgentCore Runtime&lt;/code&gt; Python SDK provides a lightweight wrapper that helps to deploy your agent function as HTTP services&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Import the runtime
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;

&lt;span class="c1"&gt;# Initialise the app
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Decorate the function
&lt;/span&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Entry point for AgentCore Runtime.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;BedrockAgentCoreApp&lt;/code&gt; wraps your function in an HTTP server that listens om port 8080 with two endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/invocations&lt;/code&gt; - a POST endpoint for agent interactions. This gets invoked when customers call the &lt;code&gt;InvokeAgentRuntime&lt;/code&gt; action with the payload in JSON format&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/ping&lt;/code&gt; - a GET endpoint for health checks to verify your agent is operational and ready to handle requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;@app.entrypoint&lt;/code&gt; decorator registers your invoke function as the handler for incoming requests. When AgentCore Runtime receives a request, it deserialises the JSON body into payload, provides a context object (with session_id, request_headers, etc.), calls your function, and serialises the returned dict back as the HTTP response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the Container Build
&lt;/h2&gt;

&lt;p&gt;When using the &lt;code&gt;@aws/agentcore&lt;/code&gt; CLI and running &lt;code&gt;agentcore deploy&lt;/code&gt;, the CLI needs to turn the Python source code into a runnable container image on &lt;code&gt;AgentCore Runtime&lt;/code&gt;. This is controlled by the &lt;code&gt;build&lt;/code&gt; field in the &lt;code&gt;agentcore.json&lt;/code&gt; file. The default setting is &lt;code&gt;CodeZip&lt;/code&gt;, in which the CLI zips up the Python source code, uploads it, and AgentCore resolves dependencies using &lt;code&gt;uv --no-build&lt;/code&gt;. This is fast but has a hard constraint, as every dependency must have a pre-built wheel. In our code, we have a package that only ships source distributions, which required us to switch to the &lt;code&gt;Container&lt;/code&gt; build setting. This also makes our build more production-ready.&lt;/p&gt;

&lt;p&gt;When you run &lt;code&gt;agentcore deploy&lt;/code&gt; with the &lt;code&gt;Container&lt;/code&gt; build type, the CLI synthesis a CloudFormation stack that includes a CodeBuild project, an ECR repository, the &lt;code&gt;AgentCore Runtime&lt;/code&gt; resource, and IAM roles. The CLI packages the &lt;code&gt;codeLocation&lt;/code&gt; directory (agent/) and uploads it to S3 as the CodeBuild source artefact. CodeBuild pulls the provided Dockerfile and builds the container image. You can see all the steps in the CodeBuild project below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhlvxc1orm6y5bh2x0sz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuhlvxc1orm6y5bh2x0sz.png" alt="CodeBuild Project" width="800" height="366"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the image builds successfully, CodeBuild tags it and pushes it to the ECR repository as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwnthqxhirsjsm1g30ktf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwnthqxhirsjsm1g30ktf.png" alt="Amazon ECR Repository" width="799" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The stack updates the Runtime resource to point at the new ECR image URI. AgentCore pulls the image from ECR the next time it starts a container for an invocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-In Conversation Managers
&lt;/h2&gt;

&lt;p&gt;In the &lt;code&gt;Strands Agents&lt;/code&gt; SDK, the user messages and agent responses are all added to the context. As the conversation grows within a session, this starting having a material impact on response times. We modified the default &lt;code&gt;SlidingWindowConversationManager&lt;/code&gt; manager:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reducing the &lt;code&gt;windowSize&lt;/code&gt; from the default of 40 to 20. This sets the maximum number of messages to keep&lt;/li&gt;
&lt;li&gt;setting the &lt;code&gt;per_turn&lt;/code&gt; parameter to false. This runs the sliding window before every model call within the same invocation, rather than waiting until after the agent loop completes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduced the average response time from around 80 seconds down to 15 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding Bedrock Guardrails
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Amazon Bedrock Guardrails&lt;/code&gt; are designed to help you safely build and deploy responsible generative AI applications with confidence. We decided to include a guardrail in the architecture, to understand where it fits in and what it can provide.&lt;/p&gt;

&lt;p&gt;The guardrail itself was defined in CDK with content filters (sexual, violence, hate, insults, misconduct and prompt attack), a topic policy (deny off-topic sports questions), and a managed profanity word list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ----------------------------------------------------------------
# Bedrock Guardrail — content safety for the agent
# ----------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;guardrail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BriefingAgentGuardrail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;briefing-agent-guardrail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content safety guardrail for the AWS Briefing Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;blocked_input_messaging&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m sorry, I can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t process that request. Please rephrase your question about AWS announcements.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;blocked_outputs_messaging&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m sorry, I can&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t provide that response. Let me try a different approach.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_policy_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ContentPolicyConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;filters_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ContentFilterConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEXUAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;input_strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;output_strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ContentFilterConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VIOLENCE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;input_strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;output_strength&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="c1"&gt;# HATE, INSULTS, MISCONDUCT, PROMPT_ATTACK
&lt;/span&gt;        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;topic_policy_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TopicPolicyConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;topics_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TopicConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;definition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Questions about sports scores, match results, player transfers, league standings, fixtures, or any sporting events.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DENY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;word_policy_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WordPolicyConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;managed_word_lists_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnGuardrail&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ManagedWordsConfigProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PROFANITY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the agent is invoked, the request first reaches the AgentCore Runtime and runs the handler code first. The guardrail itself is only applied when the handler makes the Bedrock inference call. Bedrock evaluates the input before running the model inference, and then inspects the output before returning it. We did encounter some interesting behaviour when implementing the guardrail.&lt;/p&gt;

&lt;h3&gt;
  
  
  IAM Permission Gap
&lt;/h3&gt;

&lt;p&gt;The first invocation after adding the guardrail failed with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;AccessDeniedException: User is not authorized to perform: bedrock:ApplyGuardrail
on resource: arn:aws:bedrock:eu-west-1.xxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AgentCore execution role (auto-created by the @aws/agentcore-cdk construct) includes &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; and &lt;code&gt;bedrock:InvokeModelWithResponseStream&lt;/code&gt;, but not &lt;code&gt;bedrock:ApplyGuardrail&lt;/code&gt;. The construct doesn’t know about guardrails — they’re a Bedrock feature, not an AgentCore feature. We ended up having to use the &lt;code&gt;aws iam put-role-policy&lt;/code&gt; CLI command to add the missing permission&lt;/p&gt;

&lt;h3&gt;
  
  
  Topic policies can false-positive on legitimate queries
&lt;/h3&gt;

&lt;p&gt;The initial topic policy denied "questions not related to AWS services, cloud computing, or technology". The intention was that it would be easy to demonstrate, and would ensure that the user input was relevant. However, when the user asked questions such as "what are the top announcements today", the classifier ended up deciding this was a blocked topic. In the end, to demonstrate how topic policies work, we changed it to explicitly deny sporting questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Guardrail versions can be deleted by CDK updates
&lt;/h3&gt;

&lt;p&gt;When we updated the topic policy, we changed the version description for the guardrail. The CDK stack updated the guardrail version resource, so that CloudFormation deleted version 1 and created version 2. Unfortunately, the version number is also defined in the &lt;code&gt;agentcore.json&lt;/code&gt; file. This meant that the AgentCore Runtime container still had version 1 baked into its environment, which meant calls now failed with the following exception:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ValidationException: The guardrail identifier or version provided &lt;span class="k"&gt;in &lt;/span&gt;the request does not exist.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the end it was a case of having to update the version number in &lt;code&gt;agentcore.json&lt;/code&gt;, redeploy the agent, and start a new session.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>aws</category>
    </item>
    <item>
      <title>Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering</title>
      <dc:creator>Matt Lewis</dc:creator>
      <pubDate>Wed, 20 May 2026 21:16:06 +0000</pubDate>
      <link>https://dev.to/aws-heroes/data-ingestion-rss-feeds-knowledge-base-s3-vectors-and-metadata-filtering-4n8m</link>
      <guid>https://dev.to/aws-heroes/data-ingestion-rss-feeds-knowledge-base-s3-vectors-and-metadata-filtering-4n8m</guid>
      <description>&lt;p&gt;This is the second in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on &lt;code&gt;Amazon Bedrock AgentCore Runtime&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p"&gt;Building a Full-Stack AI Agent on Bedrock AgentCore&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2: Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://dev.to/aws-heroes/strands-agents-agentcore-runtime-a-perfect-match-3a51"&gt;Strands Agents + AgentCore Runtime - a perfect match&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 4: Adding Memory to the Agent&lt;/li&gt;
&lt;li&gt;Part 5: Experimenting with API Gateway&lt;/li&gt;
&lt;li&gt;Part 6: Observability and Evaluations&lt;/li&gt;
&lt;li&gt;Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I started building the AWS Briefing Agent, the first version queried the &lt;code&gt;AWS What's New&lt;/code&gt; RSS feed on every invocation. This worked in terms of showing the agent could return tailored information back to the client. However, it was costly and wasteful, with the same data fetched repeatedly, which added latency to every invocation. The RSS feed also only covers recent information, and it was likely we would want to start searching for releases that had been launched in the past 6 months or more. The next step therefore, was to separate the retrieval by the agent from the ingestion. &lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon Bedrock Knowledge Base
&lt;/h2&gt;

&lt;p&gt;One of the key design goals was to allow the agent to match a natural language query "what's new in Bedrock this week?" against a large corpus of documents to return the most semantically similar results. This is where &lt;code&gt;Amazon Bedrock Knowledge Base&lt;/code&gt; comes into its own. It allows the agent to use RAG (Retrieval-Augmented Generation). By querying the Knowledge Base, we can retrieve relevant documents at query time, and then inject them into the prompt as context. The LLM then generates a response from this retrieved information which we know to be factual.&lt;/p&gt;

&lt;p&gt;The python CDK code that creates the Knowledge Base is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnKnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AnnouncementKnowledgeBase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-briefing-agent-announcements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="n"&gt;knowledge_base_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnKnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;KnowledgeBaseConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VECTOR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;vector_knowledge_base_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnKnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VectorKnowledgeBaseConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;embedding_model_arn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:bedrock:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;::foundation-model/amazon.titan-embed-text-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;storage_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnKnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StorageConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3_VECTORS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;s3_vectors_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnKnowledgeBase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3VectorsConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;announcements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;vector_bucket_arn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:s3vectors:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;account&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:bucket/briefing-agent-vectors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This declares the embeddings model to be used as &lt;code&gt;amazon.titan-embed-text-v2:0&lt;/code&gt; and the vector store as being of type &lt;code&gt;S3_VECTORS&lt;/code&gt;. There is no code required to handle aspects such as embeddings. Instead, Bedrock manages all of this for us.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon S3 Vectors
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Amazon Bedrock Knowledge Bases&lt;/code&gt; support several vector stores. A vector store is the retrieval engine that makes RAG work. It stores documents as numerical embeddings (vectors) that are generated by an embeddings model. At query time, the user's question is embedded, and the vector store finds documents whose embeddings are closest in meaning. &lt;/p&gt;

&lt;p&gt;The prototype uses &lt;code&gt;Amazon S3 Vectors&lt;/code&gt; as the underlying vector store. &lt;code&gt;S3 Vectors&lt;/code&gt; provides cost-effective, elastic, and durable vector storage at up to 90% lower costs for uploading, storing, and querying vectors than alternatives such as &lt;code&gt;OpenSearch Serverless&lt;/code&gt;. There is no infrastructure to manage, and it still provides a sub-second query latency which is acceptable for this use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scheduling the Ingestion
&lt;/h2&gt;

&lt;p&gt;The ingestion pipeline is run every 6 hours using &lt;code&gt;Amazon EventBridge Scheduler&lt;/code&gt;. This service provides capabilities such as built-in retry policies, time zone support, and dead-letter queues. The schedule triggers an &lt;code&gt;AWS Lambda&lt;/code&gt; function that carries out the required processing. This includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Lists existing document hashes in S3&lt;/li&gt;
&lt;li&gt;Fetches the AWS What’s New RSS feed (~100 announcements)&lt;/li&gt;
&lt;li&gt;Fetches 13 AWS blog RSS feeds (aws, machine-learning, compute, security,
database, containers, devops, networking, storage, infrastructure-and-automation,
developer, big-data, iot)&lt;/li&gt;
&lt;li&gt;Fetches the AWS Security Bulletins RSS feed&lt;/li&gt;
&lt;li&gt;For each new blog post, fetches the canonical URL and extracts the full article body
using a stdlib HTML parser&lt;/li&gt;
&lt;li&gt;Parses publication dates into YYYYMMDD integers&lt;/li&gt;
&lt;li&gt;Writes .txt and .metadata.json files per new item to S3&lt;/li&gt;
&lt;li&gt;Triggers a Bedrock KB ingestion job&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Deduplication and Incremental Writes
&lt;/h2&gt;

&lt;p&gt;When the ingestion pipeline runs, most of the content in the various RSS feeds is not new. It was important to find a way to prevent re-fetching and re-writing hundreds of announcements every 6 hours.&lt;/p&gt;

&lt;p&gt;To support this, we created an MD5 hash of the blog posts URL, truncated to 12 hex characters. This hash is used as the S3 filename. The sample code snippet is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;existing_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;existing_keys&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;url_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;link&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()[:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;url_hash&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="c1"&gt;# Already in S3, skip
&lt;/span&gt;        &lt;span class="c1"&gt;# ... write doc + metadata files
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At startup, &lt;code&gt;get_existing_keys()&lt;/code&gt; lists all the &lt;code&gt;.txt&lt;/code&gt; files in S3 and extracts the hash from each filename into a set. When processing the blog posts, the Lambda functions computes the URL hash and checks to see if it is already in the set. If it already exists, then it has been ingested in a previous run, and there is no need to re-fetch the page. If the hash does not exist, then the function fetches the page, extracts the content, and writes to S3. The hash gives a stable, deterministic filename derived from the URL. The same URL always produces the same hash.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chunking Strategy
&lt;/h2&gt;

&lt;p&gt;The chunking strategy is set on the Data Source resource in the CDK stack as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data_source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AnnouncementDataSource&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-announcements-s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;knowledge_base_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr_knowledge_base_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;data_source_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataSourceConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;s3_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3DataSourceConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;bucket_arn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bucket_arn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;vector_ingestion_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VectorIngestionConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;chunking_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ChunkingConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;chunking_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEMANTIC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;semantic_chunking_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnDataSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SemanticChunkingConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;breakpoint_percentile_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;buffer_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We utilise a &lt;code&gt;SEMANTIC&lt;/code&gt; chunking strategy. This uses the embedding model itself to decide where to split. The following three parameters control this behaviour:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;breakpoint_percentile_threshold=92&lt;/strong&gt; - controls the percentile threshold that will result in a split. A higher threshold requires sentences to be more distinguishable to split the document into different chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;max_tokens=600&lt;/strong&gt; - the maximum number of tokens that should be included in a single chunk, while honoring sentence boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;buffer_size=1&lt;/strong&gt; - for a given sentence, the buffer size defines the number of surrounding sentences to be added for embeddings creation. A larger buffer size might capture more context but can also introduce noise, while a smaller buffer size might miss important context but ensures more precise chunking.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Filtering by Date
&lt;/h2&gt;

&lt;p&gt;One of the goals in writing the agent was that a user could ask to constrain information by how recent it is e.g. "what is new in the past 7 days?". &lt;/p&gt;

&lt;p&gt;To help achieve this, at ingestion time for each document, we create an associated &lt;code&gt;metadata.json&lt;/code&gt; sidecar file that attaches structured, filterable attributes to a document so the agent can narrow search results without relying only on semantic similarity. An example companion file is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadataAttributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"published_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20260415&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"amazon-bedrock"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"artificial-intelligence"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"source_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"announcement"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During the Knowledge Base sync, Bedrock reads this sidecar and attaches those attributes to every vector chunk generated from that document. At query time, the agent can combine semantic search with metadata filters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What's new in Bedrock this week?" → vector similarity for "Bedrock" + greaterThanOrEquals filter on published_date&lt;/li&gt;
&lt;li&gt;"Show me security bulletins" → vector similarity + equals filter on source_type: "security-bulletin"&lt;/li&gt;
&lt;li&gt;"Lambda announcements from the last month" → vector similarity + filters on both service and published_date&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without the metadata file, the agent would get the most semantically similar results regardless of date or service — so a question about "this week" might return announcements from 3 months ago that happen to be textually similar. The metadata filters let the agent constrain results to the correct time window or service before ranking by relevance.&lt;/p&gt;

&lt;p&gt;The naming convention (.metadata.json) is a Bedrock KB convention — it automatically associates the sidecar with its parent document during ingestion. No code links them; the filename pattern is enough.&lt;/p&gt;

&lt;p&gt;Bedrock Knowledge Base metadata supports four types: STRING, NUMBER, BOOLEAN and STRING_LIST. There is no native data type. The comparison operators (greaterThan, greaterThanOrEquals, lessThan, lessThanOrEquals) only work with NUMBER. Our original implementation stored &lt;code&gt;published_date&lt;/code&gt; as a string ("2026-05-14"). When the agent tried to filter, we got back the following exception:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ValidationException: The filter value &lt;span class="nb"&gt;type &lt;/span&gt;provided isn&lt;span class="s1"&gt;'t supported
for the given operation: GREATER_THAN_OR_EQUALS
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix was to store dates as &lt;code&gt;YYYYMMDD&lt;/code&gt; numbers (so using "20260514" instead of "2026-05-14"). We also inject today's date into the system prompt at runtime so the LLM can easily calculate relative dates.&lt;/p&gt;

&lt;p&gt;Note that &lt;code&gt;Amazon S3 Vectors&lt;/code&gt; has a strict 2 KB limit on filterable metadata per vector. We found the Bedrock Knowledge Base internal metadata keys (&lt;code&gt;AMAZON_BEDROCK_TEXT&lt;/code&gt; and &lt;code&gt;AMAZON_BEDROCK_METADATA&lt;/code&gt;) were set as filterable by default, which caused frequent &lt;code&gt;ValidationException&lt;/code&gt; errors. The fix was mark both of these keys as non-filterable when creating the vector index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vector_index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnIndex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AnnouncementVectorIndex&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;announcements&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vector_bucket_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vector_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_bucket_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dimension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Titan Embed Text v2
&lt;/span&gt;    &lt;span class="n"&gt;distance_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cosine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;data_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metadata_configuration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;s3vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CfnIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetadataConfigurationProperty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;non_filterable_metadata_keys&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AMAZON_BEDROCK_TEXT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AMAZON_BEDROCK_METADATA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This meant the only filterable metadata is contained in the &lt;code&gt;.metadata.json&lt;/code&gt; fields, which are the only fields we filter on.&lt;/p&gt;

&lt;p&gt;The next post covers how we used an agentic framework (Strands Agents SDK) in combination with AgentCore to really start bringing the briefing agent to life.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aws</category>
      <category>rag</category>
    </item>
    <item>
      <title>Building a Full-Stack AI Agent on Amazon Bedrock AgentCore</title>
      <dc:creator>Matt Lewis</dc:creator>
      <pubDate>Wed, 20 May 2026 21:14:23 +0000</pubDate>
      <link>https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p</link>
      <guid>https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p</guid>
      <description>&lt;p&gt;This is the first in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on &lt;code&gt;Amazon Bedrock AgentCore Runtime&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: Building a Full-Stack AI Agent on Bedrock AgentCore&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://dev.to/aws-heroes/data-ingestion-rss-feeds-knowledge-base-s3-vectors-and-metadata-filtering-4n8m"&gt;Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p"&gt;Strands Agents + AgentCore Runtime - a perfect match&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 4: Adding Memory to the Agent&lt;/li&gt;
&lt;li&gt;Part 5: Experimenting with API Gateway&lt;/li&gt;
&lt;li&gt;Part 6: Observability and Evaluations&lt;/li&gt;
&lt;li&gt;Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why build an agent?
&lt;/h2&gt;

&lt;p&gt;The last few years have seen a rapid shift from Generative to Agentic AI. Most of us will remember our first experience with ChatGPT where we entered a prompt and got a response back. This was impressive at the time, but was reliant on a user typing a prompt and reacting to the response. We then saw the emergence of early AI agents that could break down tasks into smaller steps and execute them independently. Over the past year, this has evolved into fully autonomous multi-agent systems capable of completing complex tasks with minimal or even no human supervision. &lt;/p&gt;

&lt;p&gt;This shift is accelerating quickly. Gartner predicts that by 2028, more than a third of all enterprise software apps will include Agentic AI, and at least 15% of day-to-day work decisions will be made autonomously by AI agents. For organisations, the question is no longer whether agents will become part of enterprise systems, but how to build them securely, reliably and operate them at scale. From an AWS perspective, &lt;code&gt;Amazon Bedrock AgentCore&lt;/code&gt; provides a way to help enterprises achieve this goal. &lt;/p&gt;

&lt;p&gt;I decided to build an agent utilising AgentCore and its supporting capabilities and which served a purpose ... helping me keep up to date with all the latest announcements from AWS. This agent brings together Memory, Observability, Gateway, Identity, Evaluations and Registry alongside AgentCore Runtime. It allows the agent to personalise briefings just for me from 13 different RSS feeds including What's New, Blog Posts and Security Bulletins. I can get a daily update, as well as automatically post any briefings I'm really interested in to a Slack channel. And I learnt a lot in the process. This blog series covers my experience in building out this agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AgentCore Runtime?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Amazon Bedrock AgentCore&lt;/code&gt; is an AWS service that has been designed specifically for the task of hosting agents. A common saying I keep on hearing is that &lt;code&gt;Bedrock AgentCore&lt;/code&gt; is to agentic applications what &lt;code&gt;AWS Lambda&lt;/code&gt; is to event driven applications.&lt;/p&gt;

&lt;p&gt;At the heart is &lt;code&gt;AgentCore Runtime&lt;/code&gt;, which provides the secure runtime for executing the agent code. &lt;code&gt;AgentCore Runtime&lt;/code&gt; provides session-based isolation, where every session is assigned a dedicated Firecracker microVM with isolated CPU, memory and filesystem resources (the same lightweight virtualisation technology that underpins &lt;code&gt;AWS Lambda&lt;/code&gt; and &lt;code&gt;AWS Fargate&lt;/code&gt;). When the session finishes, the LLM's state information is copied to long-term memory and the entire microVM is destroyed. There is no shared state between sessions, which prevents any cross-session data leakage.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AgentCore Runtime&lt;/code&gt; is framework-agnostic and supports all popular frameworks such as Strands Agents, LangGraph and CrewAI. It also works with any LLM, such as models offered by &lt;code&gt;Amazon Bedrock&lt;/code&gt;, &lt;code&gt;Anthropic Claude&lt;/code&gt;, &lt;code&gt;Google Gemini&lt;/code&gt; and &lt;code&gt;OpenAI&lt;/code&gt; or even hosted on-premises. It supports long sessions up to 8 hours, which means it can handle complex multi-step tasks or time-consuming background processes. Unlike traditional compute services that charge for pre-allocated resources, &lt;code&gt;AgentCore Runtime&lt;/code&gt; uses consumption-based pricing where you only pay for active CPU and memory usage. With this, I/O wait and idle time is free, and you're only charged for actual resource consumption calculated at per-second increments. The runtime automatically scales from zero to thousands of concurrent sessions on demand, with no capacity planning needed, and includes reliability features like checkpointing to recover gracefully from interruptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Briefing Agent Architecture
&lt;/h2&gt;

&lt;p&gt;A high-level architecture overview of the AWS Briefing Agent is shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9yd8notj7suy0uak6zn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9yd8notj7suy0uak6zn.png" alt="Briefing Agent Architecture" width="800" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS Briefing Agent Client&lt;/strong&gt; is a &lt;code&gt;next.js&lt;/code&gt; static site hosted on AWS Amplify Hosting. It integrates directly with &lt;code&gt;Amazon Cognito&lt;/code&gt; using the &lt;code&gt;amazon-cognito-identity-js&lt;/code&gt; SDK, implementing a full sign-in, sign-up and email verification flow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS Briefing Agent&lt;/strong&gt; itself is a Python application built with the &lt;code&gt;Strands Agents&lt;/code&gt; SDK and deployed to &lt;code&gt;AgentCore Runtime&lt;/code&gt; as a Docker container. The &lt;code&gt;@aws/agentcore&lt;/code&gt; CLI handles the full deployment lifecycle. When you run &lt;code&gt;agentcore deploy&lt;/code&gt;, the CLI triggers &lt;code&gt;AWS CodeBuild&lt;/code&gt; to build the Docker image (ARM64), pushes it to &lt;code&gt;Amazon ECR&lt;/code&gt;, and deploys it to &lt;code&gt;AgentCore Runtime&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Memory&lt;/strong&gt; provides persistent user knowledge across sessions using two built-in memory strategies. The SEMANTIC memory strategy extracts factual information and knowledge from conversations that have taken place e.g. that a user works with Lambda and EKS. The USER_PREFERENCE memory strategy identifies and extracts user preferences from conversations e.g. that the user prefers technical deep dives. The agent retrieves relevant memory records at the start of each invocation and injects them as context, enabling personalised briefings from the first message of a new session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Observability&lt;/strong&gt; is used to instrument all Bedrock API calls, tool invocations and memory operations. This is carried out entirely by setting &lt;code&gt;enableOtel: true&lt;/code&gt; in the runtime config and using the opentelemetry-instrument wrapper command. Spans show up in &lt;code&gt;CloudWatch Transaction Search&lt;/code&gt; and the &lt;code&gt;CloudWatch GenAI Observability dashboard&lt;/code&gt; is populated with the sessions and traces, and provides the ability to drill into individual invocations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Evaluations&lt;/strong&gt; is configured to run online quality assessments against agent responses using built-in evaluators for Helpfulness, Goal Success Rate, and Correctness. These are shown in the front-end to give an indication on how well the agent is performing for each user.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bedrock Knowledge Base&lt;/strong&gt; is created and backed by &lt;code&gt;Amazon S3 Vectors&lt;/code&gt; that stores all announcements, blog posts and security bulletins. An ingestion &lt;code&gt;Lambda&lt;/code&gt; runs every 6 hours that writes each item as a .txt file alongside a metadata.json file to the S3 bucket, before triggering a &lt;code&gt;Knowledge Base&lt;/code&gt; sync. The agent queries the KB via the Strands retrieve tool with metadata filters for date ranges and service names, enabling questions like "what's new in Bedrock this week?"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Gateway&lt;/strong&gt; exposes a managed MCP (Model Context Protocol) endpoint that the agent connects to at runtime for tool discovery. The Slack integration is defined as an OpenAPI spec pointing at the Slack &lt;code&gt;chat.postMessage&lt;/code&gt; API, and is registered as a Gateway target. The agent discovers available tools dynamically via the MCP protocol. The Gateway handles authentication and credential injection for this integration with Slack, attaching the stored bot token as a Bearer header on outbound Slack API calls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Identity&lt;/strong&gt; stores the Slack bot token as an API key credential in its token vault (encrypted at rest via Secrets Manager). When the agent calls the tool to send a briefing to Slack, &lt;code&gt;AgentCore Identity&lt;/code&gt; retrieves the bot token and injects it into the outbound request automatically. The agent code never sees or handles the token directly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AgentCore Registry&lt;/strong&gt; is a governed catalog for agents, MCP servers, tools, skills, and custom resources. Teams can publish resources, control access through approval workflows, and enable both humans and AI agents to discover tools using semantic and keyword search. Once the Slack integration was working, the briefing agent and the Slack tool where registered in the &lt;code&gt;AgentCore Registry&lt;/code&gt;. This makes the tool discoverable by other agents in the organisation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AWS Briefing Agent in Action
&lt;/h2&gt;

&lt;p&gt;We create a new user and login to the home screen for the AWS Briefing Agent front end. The first time we use the agent, we are asked to provide information about our interests and the type of briefing style we are interested in. These get added to memory, so that the agent can personalise its responses:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4jrd2jm3fczr4jlpg0j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4jrd2jm3fczr4jlpg0j.png" alt="AWS Briefing Agent Home Page" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can provide the details of the services we are most interested to the agent. At this point, the agent will pull back the top announcements that it has retrieved from the Knowledge Base, and display them in a briefing summary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76xk7bpnkpqfpcf475kk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76xk7bpnkpqfpcf475kk.png" alt="AWS Briefing Agent Summary" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have also integrated with Slack through Gateway. This means we can ask the Briefing Agent to post the details to our Slack channel:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjar4d6oelrx7kbmgpx97.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjar4d6oelrx7kbmgpx97.png" alt="AWS Briefing Agent Send to Slack" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This means that when we go to our Slack channel, we can see a new message with our briefing, alongside all the links we can click to take us to the original blog posts and announcement articles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21vtrzzmyc8ugsi4w153.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21vtrzzmyc8ugsi4w153.png" alt="Slack Message" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the next post we cover design decisions made to ingest the data into a Bedrock Knowledge Base to support the agent&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aws</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building AI Agents with Spring AI and Amazon Bedrock AgentCore - Part 4 Provide MCP tools for Conference application via AgentCore Gateway</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Mon, 18 May 2026 14:09:12 +0000</pubDate>
      <link>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-4-provide-mcp-tools-for-2odf</link>
      <guid>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-4-provide-mcp-tools-for-2odf</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;, we explained how to deploy and run our conference search application on the Amazon Bedrock AgentCore Runtime as the MCP server. In this article, we'll develop the (MCP-) client, capable of talking to our application running on AgentCore Runtime. Later, in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-3-develop-local-mcp-client-560a"&gt;part 3&lt;/a&gt;, we developed the (MCP-) client, capable of talking to our application running on AgentCore Runtime. In this article, we'll look at another alternative to AgentCore Runtime to host MCP servers on AgentCore - AgentCore Gateway. &lt;/p&gt;

&lt;h2&gt;
  
  
  Provide the MCP Tools for the Conference application via AgentCore Gateway
&lt;/h2&gt;

&lt;p&gt;Let's imagine a hypothetical situation: we not only want to search for the conferences, but also create, search, and apply for the talks for them. With this, our conference application now supports not only the attendee role but also the speakers. This is the reason why I added functionality to support conference search by the open call for papers criteria, see part 2. This is required for the conference speakers to determine whether it's still possible to apply for the conference with their talks. &lt;/p&gt;

&lt;p&gt;When searching for conferences, we didn't have a public API, which is why we created MCP. On the other hand, for creating, searching, and applying the talks for the conferences, we indeed have a public API. Let's assume this API is hosted on the Amazon API Gateway. But it could also be any external application that exposes an OpenAPI specification. How to implement such a use case? Of course, we can use &lt;a href="https://docs.aws.amazon.com/bedrockagentcore/latest/devguide/gateway.html" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Gateway&lt;/a&gt; to securely connect our API to the AgentCore Gateway. The AgentCore Gateway can expose API functionality as MCP tools. But with this, we'll need to authenticate and hold the connection to multiple sources: AgentCore Runtime and Gateway. Without a centralized approach, customers face significant challenges: discovering and sharing tools across organizations becomes fragmented, managing authentication across multiple MCP servers grows increasingly complex, and maintaining separate gateway instances for each server quickly becomes unmanageable. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/de/blogs/machine-learning/transform-your-mcp-architecture-unite-mcp-servers-through-agentcore-gateway/" rel="noopener noreferrer"&gt;The centralized approach&lt;/a&gt;, which exposes all the tools from the central (MCP server) endpoint, would be a much better solution for our use case. Luckily, AgentCore Gateway helps to solve these challenges by treating existing MCP servers as native targets. This gives us a single point of control for routing, authentication, and tool management. It makes it as simple to integrate MCP servers as to add other targets to the gateway. AgentCore made it possible by supporting multiple targets. Those are, as of now: OpenAPI, Smithy, Amazon API Gateway, AWS Lambda, MCP Servers, and Integrations: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtx2vqmvatg9evkaldku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwtx2vqmvatg9evkaldku.png" alt=" " width="800" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conference Talks and Applications Demo
&lt;/h2&gt;

&lt;p&gt;For creating, searching, and applying the talks for the conferences, I implemented a small &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/tree/main/conference-talks-and-applications-app" rel="noopener noreferrer"&gt;conference-talks-and-applications-demo&lt;/a&gt;: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo27stzmtpcrkfji5ktf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo27stzmtpcrkfji5ktf7.png" alt=" " width="800" height="245"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I currently don't use any database to store the talks and conference applications for simplicity reasons.  My goal is only to demonstrate the approach.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; I maintain a static list of the talks in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/conference-talks-and-applications-app/src/main/java/software/amazonaws/example/conference/handler/GetConferenceTalksByTitleSubstring.java" rel="noopener noreferrer"&gt;GetConferenceTalksByTitleSubstring&lt;/a&gt; class. The search consists of looking for the provided substring of the title.&lt;/li&gt;
&lt;li&gt; When creating a new talk, I generate its random ID between 1 and 100 in &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/conference-talks-and-applications-app/src/main/java/software/amazonaws/example/conference/handler/CreateConferenceTalk.java" rel="noopener noreferrer"&gt;CreateConferenceTalk&lt;/a&gt; class and return the talk with ID, title, and description.&lt;/li&gt;
&lt;li&gt;When applying for a talk for a specific conference, I simply acknowledge that the application is created in &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/conference-talks-and-applications-app/src/main/java/software/amazonaws/example/conference/handler/CreateConferenceApplication.java" rel="noopener noreferrer"&gt;CreateConferenceTalk&lt;/a&gt; class.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I prefer to use AWS SAM as IaC for pure Serverless applications. Unfortunately, AWS SAM doesn't provide any IaC for Amazon Bedrock AgentCore yet.  Also, SAM has some limitations, as it's, for example, not possible to create the response codes for each API. And those response codes are required by the OpenAPI specification to be present. That's why I created &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/conference-talks-and-applications-app/ConferenceTalksAndApplicationsAppAPI-OpenAPISpec.yaml" rel="noopener noreferrer"&gt;OpenAPI spec&lt;/a&gt; on my own for it. We can refer to this specification when defining the API like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;MyApi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Api&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;StageName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;Stage&lt;/span&gt;
    &lt;span class="na"&gt;DefinitionBody&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="s"&gt;Fn::Transform&lt;/span&gt;
           &lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Include&lt;/span&gt;
           &lt;span class="s"&gt;Parameters&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
             &lt;span class="na"&gt;Location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConferenceTalksAndApplicationsAppAPI-OpenAPISpec.yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also secured our API with an API key, whose value is by definition passed as the HTTP header parameter "x-api-key". This will play a role when we configure the outbound authentication of the AgentCore Gateway API Gateway target:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;MyApiKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::ApiGateway::ApiKey&lt;/span&gt;
  &lt;span class="s"&gt;....&lt;/span&gt;
  &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
    &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ConferenceTalksAndApplicationsAppAPIKey"&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ConferenceTalksAndApplicationsApp&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;API&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Key"&lt;/span&gt;
    &lt;span class="na"&gt;Enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;GenerateDistinctId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;a6ZbcDgjkQW10BN56ASR25&lt;/span&gt;
    &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also defined an API stage with the name &lt;em&gt;prod&lt;/em&gt;. Now, we can deploy this application by executing &lt;code&gt;sam deploy -g&lt;/code&gt;, and we will see the individual URL in the response. For example, &lt;em&gt;&lt;a href="https://k370s19lk3.execute-api.us-east-1.amazonaws.com/prod" rel="noopener noreferrer"&gt;https://k370s19lk3.execute-api.us-east-1.amazonaws.com/prod&lt;/a&gt;&lt;/em&gt;.  We'll need the REST API ID, which is in our case k370s19lk3, later when creating the IaC for the AgentCore Gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create AgentCore Gateway with different targets
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;, we started to create the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/tree/main/spring-ai-1.1-conference-app-bedrock-agentcore-cdk" rel="noopener noreferrer"&gt;IaC for the Conference (Search) application&lt;/a&gt;. It consisted mainly of the AgentCore Runtime with the MCP protocol and everything needed for that, like the Cognito User (Client) Pool. We used CDK for Java for it. We'll now call this application the Conference application, as we are extending its functionality beyond the search. Our goal is now to create AgentCore Gateway with 2 targets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;existing AgentCore Runtime with MCP protocol for the conference search (MCP) tools&lt;/li&gt;
&lt;li&gt;conference talks and applications demo deployed on Amazon Gateway API to expose all its APIs as (MCP) tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can find the full source code in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-bedrock-agentcore-cdk/src/main/java/dev/vkazulkin/agentcore/gateway/GatewayTargetStack.java" rel="noopener noreferrer"&gt;GatewayTargetStack&lt;/a&gt; class. &lt;/p&gt;

&lt;p&gt;Let's go step-by-step through it. We first create the AgentCore Gateway itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gateway&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Gateway-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gatewayName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;appName&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"-"&lt;/span&gt;&lt;span class="o"&gt;)+&lt;/span&gt; &lt;span class="s"&gt;"-gateway"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;authorizerConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CustomJwtAuthorizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;allowedClients&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
          &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;UserClientPoolStack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
               &lt;span class="n"&gt;userPoolClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getUserPoolClientId&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;discoveryUrl&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;UserClientPoolStack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;COGNITO_DISCOVERY_URL&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RuntimeWithMCPStack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AgenCore Runtime with MCP protocol for running conference search app"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most interesting part is configuring the custom JWT authorizer as an inbound authentication. Here we reuse the Cognito User (Client) Pool created in part 2. We set the same user client pool ID and discovery URL. We also reuse the same AWS IAM role that we used to create AgentCore  Runtime in part 2. Please also read the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy-getting-started.html" rel="noopener noreferrer"&gt;Getting started with Policy in AgentCore&lt;/a&gt; in addition to the resources from part 2 on how to create one.&lt;/p&gt;

&lt;p&gt;Now, let's create the AgentCore Gateway target of our MCP Server running on AgentCore Runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"MCP-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;           
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;targetConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;McpServerTargetConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;         
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialProviderConfigurations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oauthCredentialProviderConfigs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gatewayTargetName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mcp-target"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"AgentCore Runtime MCP Server Target "&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
         &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We set &lt;em&gt;McpServerTargetConfiguration&lt;/em&gt;, which defines that the Gateway target is the MCP Server running on AgentCore Runtime. Also, we set the target name and description, and provide the AgentCore Gateway to which this target belongs.  We need to set the endpoint URL, which always follows the same schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"https://bedrock-agentcore."&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; 
        &lt;span class="s"&gt;".amazonaws.com/runtimes/"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
        &lt;span class="nc"&gt;RuntimeWithMCPStack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAgentRuntimeId&lt;/span&gt;&lt;span class="o"&gt;()+&lt;/span&gt;
       &lt;span class="s"&gt;"/invocations? 
       qualifier=DEFAULT&amp;amp;accountId="&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nc"&gt;Stack&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;getAccount&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We obtain the runtime ID property from the created AgentCore Runtime in the &lt;em&gt;RuntimeWithMCPStack&lt;/em&gt; stack. The next part is to configure the outbound authentication. This means to configure how the Agentcore Gateway MCP target authenticates with the AgentCore Runtime with the MCP protocol. For this, we need to use AgentCore Identity.&lt;br&gt;
As described in the following &lt;a href="https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/23" rel="noopener noreferrer"&gt;issue&lt;/a&gt;, it's currently not possible to create the AgentCore Identity with CloudFormation. That's why CDK also can't provide this functionality. That's why we need to create it manually and then provide the configuration for this stack. Let's secure it with the existing OAuth Client. Let's go to AgentCore Identity and click on "Add Outbound Auth" -&amp;gt; "Add OAuth Client". Then select "Custom Provider" -&amp;gt; "Discovery URL" :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrsvb1oprnutpatb4el2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrsvb1oprnutpatb4el2.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can reuse the Cognito User Pool Client ID, Client Secret, and Discovery URL from &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After we created the AgentCore Identity, let's grab its ARN:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncju8yxda8r98p3n05ft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fncju8yxda8r98p3n05ft.png" alt=" " width="800" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Client Secret will be automatically stored as a Secret in the AWS Secrets Manager. Let's also grab Secret ARN:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbf555ge962hqi15p4bhu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbf555ge962hqi15p4bhu.png" alt=" " width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let's configure both in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-bedrock-agentcore-cdk/cdk.json" rel="noopener noreferrer"&gt;cdk.json&lt;/a&gt; :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mvn -e -q compile exec:java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"agentcoreIdentityOutboundOAuthArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock-agentcore:us-east-1:{AWS_ACCOUNT_ID}:token-vault/default/oauth2credentialprovider/resource-provider-oauth-gateway"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"oAuthSecretArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:secretsmanager:us-east-1:{AWS_ACCOUNT_ID}:secret:bedrock-agentcore-identity!default/oauth2/resource-provider-oauth-gateway-ba3b089d-toYfaV"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please replace both values with your individual ARNs. I explained in part 2 how we handle the AWS Account ID. Now, let's create and configure the credential provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CloudFormation, see the issue https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/2391&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;oAuthProviderArn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"agentcoreIdentityOutboundOAuthArn"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;oAuthSecretArn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"oAuthSecretArn"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;oauthCredentialProviderConfigs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;GatewayCredentialProvider&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromOauthIdentityArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OAuthConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
          &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;providerArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oAuthProviderArn&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
          &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;secretArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oAuthSecretArn&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
          &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
          &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;

  &lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"MCP-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;           
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialProviderConfigurations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oauthCredentialProviderConfigs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
 &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We grab the AgentCore Identity and Secret ARNs and use them to create an OAuth Credential Provider. We then set it when creating the AgentCore Target credential provider configuration.&lt;/p&gt;

&lt;p&gt;Now we are done with creating the AgentCore MCP Target.  The next step is to create an Amazon API Gateway target. Please also read the article &lt;a href="https://docs.aws.amazon.com/de_de/bedrock-agentcore/latest/devguide/gateway-target-api-gateway.html" rel="noopener noreferrer"&gt;AgentCore Gateway Amazon API Gateway stages&lt;/a&gt; to gain an understanding of how AgentCore Gateway obtains the OpenAPI spec from the Amazon Gateway stage.&lt;/p&gt;

&lt;p&gt;First of all, let's define the API stage name in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-bedrock-agentcore-cdk/cdk.json" rel="noopener noreferrer"&gt;cdk.json&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mvn -e -q compile exec:java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"restApiStageName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prod"&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We'll pass the restApiId via the console parameter. We created it above when we deployed the conference talks and applications demo. Similar to AWS Account ID, which is public, we don't want to configure it in cdk.json:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;restApiId&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getNode&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tryGetContext&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"restApiId"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;restApiStageName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"restApiStageName"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

  &lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"APIGATEWAY-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;         
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;targetConfiguration&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayTargetConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;restApi&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromRestApiId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"APIGATEWAY-ID"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;restApiId&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
         &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;restApiStageName&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
            &lt;span class="o"&gt;...&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gatewayTargetName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"apigateway-target"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Amazon ApiGateway Target "&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gateway&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we create the AgentCore Gateway Target as an Amazon API Gateway Target, set the target name and description. We also provide the REST API ID, stage, and AgentCore Gateway to which this target belongs.&lt;/p&gt;

&lt;p&gt;We can define the tool filters. With that, we can shrink what Amazon API Gateway APIs will be exposed as MCP tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"APIGATEWAY-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;           
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;targetConfiguration&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayTargetConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apiGatewayToolConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayToolConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
       &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolFilters&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ApiGatewayToolFilter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filterPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/talks/{titleSubstring}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                                
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;GET&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;                 
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
        &lt;span class="nc"&gt;ApiGatewayToolFilter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filterPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/apply"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
        &lt;span class="nc"&gt;ApiGatewayToolFilter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filterPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/talks"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;                    
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;methods&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our example, we expose all 3 APIs (/apply, /talks, //talks/{titleSubstring}) as MCP tools.&lt;/p&gt;

&lt;p&gt;Next, let's use the tool override to give the MCP tools the proper names and descriptions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"APIGATEWAY-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;           
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;targetConfiguration&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayTargetConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apiGatewayToolConfiguration&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayToolConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
      &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolOverrides&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
          &lt;span class="nc"&gt;ApiGatewayToolOverride&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"apply-to-conferences-w-conference-id-talk-id"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/apply"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"apply to the conference with conference Id and talk Id"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; 
         &lt;span class="nc"&gt;ApiGatewayToolOverride&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"create-new-talk"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/talks"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"create a new talk with talk Id, title and description"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;                    
         &lt;span class="nc"&gt;ApiGatewayToolOverride&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayHttpMethod&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;GET&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"get-talks-by-title-substring"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/talks/{titleSubstring}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"get talks by their title substring"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
           &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, LLM can easily find the right tool for the job.&lt;/p&gt;

&lt;p&gt;The last part is to define how AgentCore Gateway handles the outbound authentication to the Amazon API Gateway. As described above and in the following &lt;a href="https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/23" rel="noopener noreferrer"&gt;issue&lt;/a&gt;, it's currently not possible to create the AgentCore Identity with CloudFormation. That's why CDK also can't provide this functionality. That's why we need to create it manually and then provide the configuration for this stack. Let's secure this Target with the API Key, as it is how we secured our Amazon Gateway API. Let's go to AgentCore Identity and click on "Add Outbound Auth" -&amp;gt; "Add API Key" :&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fis2os77ekoke7z1d0cdh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fis2os77ekoke7z1d0cdh.png" alt=" " width="744" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please put the same API Key that we used to secure our API. We defined it in the  &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/conference-talks-and-applications-app/template.yaml" rel="noopener noreferrer"&gt;SAM template&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
After we created the AgentCore Identity, let's grab its ARN:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9i3n44bxl7e81xevky0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9i3n44bxl7e81xevky0y.png" alt=" " width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Client Secret will be automatically stored as a Secret in the AWS Secrets Manager. Let's also grab Secret ARN:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0npa4hw9j3qf2ni3o3zb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0npa4hw9j3qf2ni3o3zb.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, let's configure both in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-bedrock-agentcore-cdk/cdk.json" rel="noopener noreferrer"&gt;cdk.json&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"app"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mvn -e -q compile exec:java"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"agentcoreIdentityOutboundApiKeyArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock-agentcore:us-east-1:{AWS_ACCOUNT_ID}:token-vault/default/apikeycredentialprovider/resource-provider-api-key-gateway"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKeySecretArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:secretsmanager:us-east-1:{AWS_ACCOUNT_ID}:secret:bedrock-agentcore-identity!default/apikey/resource-provider-api-key-gateway-02d581b0-L9scmD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;....&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please replace both values with your individual ARNs. I explained in part 2 how we handle the AWS Account ID. Now, let's create and configure the credential provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;apiKeyProviderArn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"agentcoreIdentityOutboundApiKeyArn"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;apiKeySecretArn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConventionalDefaults&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getContextVariableValueWithReplacedAccountId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"apiKeySecretArn"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;apiKeyProviderConfigs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;GatewayCredentialProvider&lt;/span&gt;          
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;fromApiKeyIdentityArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiKeyCredentialProviderProps&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
               &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;providerArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKeyProviderArn&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;secretArn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKeySecretArn&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
               &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialLocation&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiKeyCredentialLocation&lt;/span&gt;                    
                   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;header&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ApiKeyAdditionalConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialParameterName&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"x-api-key"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialPrefix&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
               &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;

&lt;span class="nc"&gt;GatewayTarget&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"APIGATEWAY-Target-123"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;           
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;targetConfiguration&lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;ApiGatewayTargetConfiguration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;create&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;credentialProviderConfigurations&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;apiKeyProviderConfigs&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;...&lt;/span&gt;
  &lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We grab the AgentCore Identity and Secret ARNs and use them to create an API Key Credential Provider. Then we define to set the credentials within the HTTP header with the name &lt;em&gt;x-api-key&lt;/em&gt;. This is how we secured the Amazon API Gateway. Another option that AgentCore Gateway supports is to set them as query parameters. We then set them when creating the AgentCore Target credential provider configuration.&lt;/p&gt;

&lt;p&gt;To deploy the AgentCore Gateway, please invoke &lt;code&gt;cdk deploy spring-ai-conference-search-agentcore-gateway-with-mcp-server-target-stack -c awsAccountId={YOUR_AWS_ACCOUNT_ID} -c restApiId={YOUR_API_ID}&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;After having successfully executed the AgentCore Gateway deployment, we'll see our Gateway in the console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0fnro4uqhadcnrtsu84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0fnro4uqhadcnrtsu84.png" alt=" " width="800" height="165"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to grab the Gateway URL, which ends with &lt;em&gt;/mcp&lt;/em&gt;.  We also see both Gateway targets we created:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bgts2vqw5d9fzhmi4nz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7bgts2vqw5d9fzhmi4nz.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This AgentCore exposes 7 MCP tools in total:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; 4 tools for the conference search provided by the MCP server from part 2 and deployed on AgentCore Runtime.&lt;/li&gt;
&lt;li&gt; 3 tools to create a talk, search for existing talks, and apply for the conference with the talk. This 3 tools are provided through the Amazon API Gateway we deployed in this article.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let's extend our Conference Application MCP client that we developed in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-3-develop-local-mcp-client-560a"&gt;part 3&lt;/a&gt;, so it can use this AgentCore Gateway MCP endpoint.&lt;/p&gt;

&lt;p&gt;The important remaining topic is designing the IAM role and permissions so that AgentCore Gateway can handle inbound and outbound authentication and communicate with the Amazon API Gateway. I'll refer you to the articles, which cover those topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-inbound-auth.html#gateway-inbound-auth-iam" rel="noopener noreferrer"&gt;inbound authentication&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-outbound-auth.html#gateway-outbound-auth-oauth" rel="noopener noreferrer"&gt;outbound authorization with an OAuth client&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-outbound-auth.html#gateway-outbound-auth-api-key" rel="noopener noreferrer"&gt;outbound authorization with an API Key&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/gateway-target-api-gateway.html#gateway-target-api-gateway-outbound" rel="noopener noreferrer"&gt;outbound authorization methods for an API Gateway API&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Extend our local Conference Application MCP client
&lt;/h2&gt;

&lt;p&gt;In part 3, we developed a generic &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/tree/main/spring-ai-1.1-conference-app-agent-local" rel="noopener noreferrer"&gt;local MCP client&lt;/a&gt; capable of talking to each MCP server. I decided to extend it to be able to configure the AgentCore Gateway endpoint. This gives us the following options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;by configuring the &lt;em&gt;amazon.bedrock.agentcore.runtime.id&lt;/em&gt; property in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/resources/application.properties" rel="noopener noreferrer"&gt;application.properties&lt;/a&gt; to be not a blank string, we'll still connect to the MCP server running on AgentCore Runtime. It exposes only 4 MCP tools for the conference search.&lt;/li&gt;
&lt;li&gt;by configuring the &lt;em&gt;amazon.bedrock.agentcore.gateway.url&lt;/em&gt; property in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/resources/application.properties" rel="noopener noreferrer"&gt;application.properties&lt;/a&gt; to be not a blank string, we'll connect to the AgentCore Gateway created previously, which exposes all 7 MCP tools. This is how we'll use it to show what is possible with that. Please make sure that &lt;em&gt;amazon.bedrock.agentcore.runtime.id=&lt;/em&gt; is set to an empty string.&lt;/li&gt;
&lt;li&gt;by configuring both properties, &lt;em&gt;amazon.bedrock.agentcore.runtime.id&lt;/em&gt; take precedence. This is how I implemented the logic in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/java/dev/vkazulkin/agent/controller/SpringAIAgentController.java" rel="noopener noreferrer"&gt;SpringAIAgentController&lt;/a&gt; class:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getMCPServerEndpoint&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="no"&gt;AGENTCORE_RUNTIME_ID&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isBlank&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"https://bedrock-agentcore."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;awsRegion&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".amazonaws.com/runtimes/"&lt;/span&gt;
       &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="no"&gt;AGENTCORE_RUNTIME_ID&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/invocations?qualifier=DEFAULT&amp;amp;accountId="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getAccountId&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(!&lt;/span&gt;&lt;span class="no"&gt;AGENTCORE_GATEWAY_URL&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;isBlank&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
       &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;AGENTCORE_GATEWAY_URL&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" no AgentCore Runtime Id or AgentCore Gateway URL defined"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can change this logic if you wish.&lt;/p&gt;

&lt;p&gt;Now we can use CURL or &lt;a href="https://httpie.io/docs/cli/installation" rel="noopener noreferrer"&gt;HTTPie&lt;/a&gt; to send some prompts. For example:&lt;/p&gt;

&lt;p&gt;"Please provide me with the list of conferences, including their IDs, with Java topics happening in 2027, with the call for papers open today. Also, provide me with the list of my talks with this topic in the title. Finally, for each conference and talk retrieved, apply individually for the conference".&lt;/p&gt;

&lt;p&gt;Here is an example of the request with HTTPie:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;http GET http://localhost:8080/conference?prompt="Please provide me with the list of conferences, including their IDs, with Java topics happening in 2027, with the call for papers open today. Also, provide me with the list of my talks with this topic in the title. Finally, for each conference and talk retrieved, apply individually for the conference." Content-Type:text/plain&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is the correct LLM response: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahrxf31uh3ctynvjgvs4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahrxf31uh3ctynvjgvs4.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's try another prompt:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;http GET http://localhost:8080/conference?prompt="Please create a talk with a cool title (max 60 characters long) and description (max 300 characters long) about using Spring AI on the Amazon Bedrock AgentCore service. Then provide me with the list of conferences, including their IDs, with Java topics happening in 2026 and 2027, with the call for papers open today. Finally, for each conference, apply individually for it with the talk just created." Content-Type:text/plain&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is the correct LLM response again: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kuaovv9kzlbkc5os1it.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4kuaovv9kzlbkc5os1it.png" alt=" " width="800" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cool, we created AgentCore Gateway, which gives us centralized access to the MCP tools that we need or the agent needs to accomplish the goal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we looked at how to provide the MCP Tools for the Conference application via AgentCore Gateway in a centralized way.&lt;/p&gt;

&lt;p&gt;As we saw in this and previous articles, the local MCP client for the Conference application, to talk to AgentCore Runtime or Gateway, became quite big. If we have many customers using such a client, changing and operating it can become quite challenging. That's why, in the next article, we look at how to deploy and run our MCP client on AgentCore Runtime.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like my content, please follow me on &lt;a href="https://github.com/Vadym79" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give my repositories a star!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also check out my &lt;a href="https://vkazulkin.com" rel="noopener noreferrer"&gt;website&lt;/a&gt; for more technical content and upcoming public speaking activities.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>springai</category>
      <category>bedrockagentcore</category>
    </item>
    <item>
      <title>One model is a guess. Three that agree is a plan.</title>
      <dc:creator>Anton Babenko</dc:creator>
      <pubDate>Mon, 18 May 2026 08:04:46 +0000</pubDate>
      <link>https://dev.to/aws-heroes/one-model-is-a-guess-three-that-agree-is-a-plan-1po3</link>
      <guid>https://dev.to/aws-heroes/one-model-is-a-guess-three-that-agree-is-a-plan-1po3</guid>
      <description>&lt;p&gt;&lt;em&gt;Why I shipped multi-model consensus as a plugin, plus two quieter tools that keep agents honest.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Updated (28.5.2026): &lt;code&gt;/consensus&lt;/code&gt; is now a 3-stage loop. Stage 2 adds a blind cross-review where each external rates the others' answers, anonymized, before Claude adjudicates. Pattern from &lt;a href="https://github.com/karpathy/llm-council" rel="noopener noreferrer"&gt;karpathy/llm-council&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Updated (26.5.2026): Grok (xAI) joins GPT and Gemini as the third external provider, Gemini 3 is the default via Google's Antigravity CLI, two new experts (Researcher and Debugger) bring the count to seven, reviews are severity-graded, and &lt;code&gt;/consensus&lt;/code&gt; now forces Claude to commit a blind verdict in the transcript before dispatching - arbiter-mediated, not pure democracy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Ask one model to plan something hard - a migration, a refactor, a cutover - and you get a fluent, confident answer. Fluency is not correctness. A single model is articulate and alone, and being alone is the problem: nothing in the loop disagrees with it, so it rationalizes its first guess into a plan.&lt;/p&gt;

&lt;p&gt;The expensive failures with coding agents are almost never syntax. They are plans that read well and were wrong: the wrong abstraction, the missed blast radius, the migration step that bricks state. You find out three hours into execution, not at review time.&lt;/p&gt;

&lt;p&gt;I have been running the fix for the last few months - on Terraform modules, and on the everyday work of running &lt;a href="https://compliance.tf" rel="noopener noreferrer"&gt;compliance.tf&lt;/a&gt;, not only on code. It is not a bigger model. It is making models disagree on purpose and then forcing them to resolve it. That is what &lt;code&gt;consensus&lt;/code&gt; does, and it is the reason the &lt;a href="https://github.com/antonbabenko/agent-plugins" rel="noopener noreferrer"&gt;&lt;code&gt;agent-plugins&lt;/code&gt;&lt;/a&gt; repo exists. Two other tools ship with it; they are narrower, and I will get to them.&lt;/p&gt;

&lt;h2&gt;
  
  
  One model is a guess
&lt;/h2&gt;

&lt;p&gt;A single model samples one distribution with no adversary in the room. Two independent models rarely make the same mistake on a plan. Where they diverge is, almost exactly, the risky part of the plan - the assumption nobody checked. A consensus loop turns that disagreement from noise into a signal you can act on.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;/consensus&lt;/code&gt; actually does
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjldhv7ayzq2e016btiwd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjldhv7ayzq2e016btiwd.png" alt="All input is important" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/consensus&lt;/code&gt; runs GPT (via Codex), Gemini 3 (via Antigravity), and Grok (via the xAI API) against the same artifact, with Claude as the arbiter. Each round has three stages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1 - parallel verdicts.&lt;/strong&gt; Claude posts its own verdict (APPROVE / REQUEST CHANGES / REJECT) into the transcript first, blind, before any external sees the work. The pre-commitment sits there in writing, so Claude's judgment cannot drift later to match what the others say. Then GPT, Gemini, and Grok review the artifact in parallel, each in a fresh thread, single-shot. None sees another's review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2 - blind cross-review.&lt;/strong&gt; Each external then rates the OTHER externals' answers, identity stripped best-effort. Votes of "not viable" become candidate critical issues the arbiter has to weigh. This catches the case where Stage 1 looks like agreement but is really three reviewers each rationalizing past the same hole. Pattern adapted from &lt;a href="https://github.com/karpathy/llm-council" rel="noopener noreferrer"&gt;karpathy/llm-council&lt;/a&gt;. Stage 2 fires every round 1, and after that only when Stage 1 disagreed or the previous Stage 2 surfaced an accepted issue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3 - arbiter adjudication.&lt;/strong&gt; Claude reconciles the Stage 1 verdicts, the Stage 2 candidate issues, and its own blind verdict. Every objection is accepted, dismissed with a recorded reason, or deferred. Claude revises the artifact and the loop runs again, up to five rounds. It converges only when every responding external approves and Claude's pre-committed verdict agrees, with reasons on both sides where it walked back. If the group cannot agree, it says so plainly instead of faking it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/antonbabenko/claude-delegator/blob/master/assets/consensus-flow.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzslov8lmvxleh0jegoms.png" alt="/consensus 3-stage flow" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Click for the detailed diagram with bias guards and per-model flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Independence is not only about which model. It is sharper when each reviewer wears a different hat. &lt;code&gt;claude-delegator&lt;/code&gt; ships seven expert profiles - Architect, Plan Reviewer, Scope Analyst, Code Reviewer, Security Analyst, Researcher, and Debugger.&lt;/p&gt;

&lt;p&gt;Combine the axes. A Security Analyst on Gemini and an Architect on GPT fight about different things than one model reviewing twice. Different profiles catch different &lt;em&gt;categories&lt;/em&gt; of mistake.&lt;/p&gt;

&lt;p&gt;For a migration plan I run the Plan Reviewer broadly and add a Security Analyst pass on top. "Is this safe to run" and "is this the right shape" get argued by separate reviewers, not averaged into one bland verdict.&lt;/p&gt;

&lt;p&gt;There is a second kind of contamination worth naming: me. In &lt;code&gt;consensus&lt;/code&gt;, every round sends the reviewers the same artifact text, cold, in a fresh thread. They never see my triage, my running verdict, or how I framed the previous round - only the artifact and bounded round metadata. My judgment is applied after they report, not baked into what they receive.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;ask-*&lt;/code&gt; commands are the opposite by design: the expert gets exactly the prompt I assembled from the conversation - what I chose to hand it. Fast for a second opinion, but the input is mine, not independent. &lt;code&gt;consensus&lt;/code&gt; keeps the input independent and pays for it in extra rounds.&lt;/p&gt;

&lt;p&gt;It does not have to be a plan. The loop runs on anything you can put in text - a design, a runbook, a decision memo, a spec. Plans are simply where I reach for it most, and where it converges fastest. The looser and fuzzier the input, the more rounds it takes to agree, so non-plan runs tend to run longer - worth it when the answer matters, overkill for a quick lookup. For that, single-shot &lt;code&gt;ask-gpt&lt;/code&gt;, &lt;code&gt;ask-gemini&lt;/code&gt;, &lt;code&gt;ask-grok&lt;/code&gt;, and &lt;code&gt;ask-all&lt;/code&gt; (which fans out to all three in parallel) are right there. &lt;code&gt;consensus&lt;/code&gt; is for when it has to be right.&lt;/p&gt;

&lt;p&gt;Nothing about this is Terraform, or even code. That generality is why it is the headline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The quieter two
&lt;/h2&gt;

&lt;p&gt;Same release, narrower scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/antonbabenko/agent-plugins" rel="noopener noreferrer"&gt;code-intelligence&lt;/a&gt;&lt;/strong&gt; - a language-agnostic skill. Agents grab text &lt;code&gt;grep&lt;/code&gt; when they should ask the language server, and silently swap tools when one is missing, then report "found all references." This encodes search precedence: the language server (LSP) for symbols, &lt;code&gt;rg&lt;/code&gt;/&lt;code&gt;ripgrep&lt;/code&gt; for exact text, an embedding/semantic grep (such as &lt;a href="https://github.com/mixedbread-ai/mgrep" rel="noopener noreferrer"&gt;&lt;code&gt;mgrep&lt;/code&gt;&lt;/a&gt;) for fuzzy discovery - and a hard rule to disclose any substitution on the first line of the reply. You learn the moment coverage drops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/antonbabenko/terraform-skill" rel="noopener noreferrer"&gt;terraform-skill&lt;/a&gt;&lt;/strong&gt; - routes a Terraform/OpenTofu request to its real failure mode (identity churn, blast radius, state corruption) before emitting HCL. It is &lt;code&gt;terraform-ls&lt;/code&gt; aware: it knows the language server has no rename provider, so it runs the safe manual reference workflow instead of a blind find-replace. It is approaching 2,000 GitHub stars - the part of this post I am quietly proud of.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those two are discipline for the agent's hands. Consensus is discipline for its judgment, and judgment generalizes further.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note on claude-delegator
&lt;/h2&gt;

&lt;p&gt;I did not invent the delegation layer. &lt;a href="https://github.com/antonbabenko/claude-delegator" rel="noopener noreferrer"&gt;&lt;code&gt;claude-delegator&lt;/code&gt;&lt;/a&gt; is a fork of &lt;a href="https://github.com/jarrodwatts/claude-delegator" rel="noopener noreferrer"&gt;Jarrod Watts' original&lt;/a&gt; (MIT, upstream currently quiet), fully based on his design - I kept the structure and the license.&lt;/p&gt;

&lt;p&gt;What I added is what months of daily use exposed: a Gemini bridge that wraps Google's Antigravity CLI (&lt;code&gt;agy&lt;/code&gt;) with &lt;code&gt;auto-gemini-3&lt;/code&gt; as the routing default and recovers an answer the CLI flushed to disk after a soft timeout instead of failing the call; a fresh Grok bridge over the xAI API that is advisory-only but reads attached files via the xAI Files API (with TTL-based cleanup); two more experts (Researcher and Debugger) on top of the original five; severity-graded reviews so three parallel reports merge cleanly; and a hardened &lt;code&gt;/consensus&lt;/code&gt; loop where Claude pre-commits a blind verdict before any external sees the artifact, with a Stage 2 blind cross-review on top (adapted from karpathy/llm-council). Plus the bundled &lt;code&gt;ask-gpt&lt;/code&gt; / &lt;code&gt;ask-gemini&lt;/code&gt; / &lt;code&gt;ask-grok&lt;/code&gt; / &lt;code&gt;ask-all&lt;/code&gt; / &lt;code&gt;consensus&lt;/code&gt; commands, so the workflow ships with the plugin instead of living in my dotfiles. The seven expert prompts borrow from &lt;code&gt;oh-my-openagent&lt;/code&gt; and &lt;code&gt;PAL&lt;/code&gt;; both credited in the README.&lt;/p&gt;

&lt;p&gt;Credit for the foundation is his; the bug-fixing scar tissue is mine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a plugin, not a blog post
&lt;/h2&gt;

&lt;p&gt;One honest caveat, because I have hit it myself: skills are model-triggered, which makes them soft. Packaging this as a plugin improves reuse and discoverability. It does not guarantee the agent obeys every time - hard enforcement (a real pre-tool gate) is a separate, still-open problem.&lt;/p&gt;

&lt;p&gt;I keep finding and fixing bugs in all three. That constant repair is the only reason I trust them enough to write this - and I would rather say so than oversell the fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Stop accepting the first confident plan from one model. Make them argue, and only move when they stop. The release is at &lt;a href="https://github.com/antonbabenko/agent-plugins" rel="noopener noreferrer"&gt;github.com/antonbabenko/agent-plugins&lt;/a&gt;; &lt;code&gt;consensus&lt;/code&gt; ships in &lt;a href="https://github.com/antonbabenko/claude-delegator" rel="noopener noreferrer"&gt;&lt;code&gt;claude-delegator&lt;/code&gt;&lt;/a&gt; from the same marketplace. The cheapest review is the one that happens before you execute.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
    <item>
      <title>You don't 3D print a house. You print your tools.</title>
      <dc:creator>Luca Bianchi</dc:creator>
      <pubDate>Fri, 15 May 2026 12:17:19 +0000</pubDate>
      <link>https://dev.to/aws-heroes/you-dont-3d-print-a-house-you-print-your-tools-2h00</link>
      <guid>https://dev.to/aws-heroes/you-dont-3d-print-a-house-you-print-your-tools-2h00</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Vibe coding is to engineering what 3D printing is to making, and that's exactly why it matters.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There's a recurring debate in our industry about whether vibe coding will replace serious software engineering. Both camps are framing the question wrong. The right reference is the desktop 3D printer.&lt;/p&gt;

&lt;p&gt;When 3D printing went mainstream a decade ago, the same two camps showed up. One predicted print-on-demand cars and houses; the other dismissed it as a toy. What happened was stranger. 3D printing didn't replace manufacturing. It collapsed the cost of bespoke tooling. A specific bracket for a specific shelf in a specific corner of your specific workshop used to be a project. Now it's a Sunday afternoon. Nobody prints a load-bearing wall. Everyone prints jigs, fixtures, replacement knobs, and tools tailored to the exact job at hand.&lt;/p&gt;

&lt;p&gt;That's the mental model for vibe coding.&lt;/p&gt;

&lt;h3&gt;
  
  
  The analogy, precisely
&lt;/h3&gt;

&lt;p&gt;Production systems engineering still works the way it did. Distributed transactions, security boundaries, multi-region failover, and regulatory-grade audit trails- none of that becomes "vibe-able." The constraints are the same: correctness, throughput, observability, blast radius. If anything, the bar has gone up because the cost of writing plausible-looking wrong code has dropped to zero.&lt;/p&gt;

&lt;p&gt;But there's a category of software that used to sit in a no-man's-land: too specific to be worth packaging as a product, too tedious to write by hand for a single use, too critical to skip entirely. Internal scripts that should have safety rails. CLIs you run twice a year. Migration tools tied to the exact shape of your stack. These were the software brackets and jigs, necessary, valuable, and almost always either skipped or built poorly because nobody had the budget for them.&lt;/p&gt;

&lt;p&gt;That's where vibe coding shines. It's a workshop tool that brings industrial-grade results to one-off problems. The cost of bespoke, well-built tooling has collapsed to the point where it makes economic sense to build it for a single use.&lt;/p&gt;

&lt;h3&gt;
  
  
  The instance: a Route 53 migration
&lt;/h3&gt;

&lt;p&gt;Last week, I needed to move a Route 53-hosted zone from one AWS account to another. Standard enterprise hygiene, wrong account ownership, billing consolidation, the usual story. The problem itself is straightforward if you know Route 53: you can't transfer a hosted zone directly between accounts. You list the records in the source, create a new zone in the destination, replay the records into it, then cut over the registrar's NS delegation.&lt;/p&gt;

&lt;p&gt;Each step has small traps. The apex NS records and the SOA record are auto-generated by AWS and will be rejected on import. Pagination on ListResourceRecordSets uses a three-field cursor: name, type, and set identifier, not a simple token. The ChangeResourceRecordSets API has a hard cap of 1000 changes per call, but it gives much better error messages if you batch smaller changes. Private zones require VPC re-association and are a separate problem. None of these is hard. They're just sharp edges that someone running this once is statistically guaranteed to hit.&lt;/p&gt;

&lt;p&gt;Pre-2023, my options were three. Run it manually through the console, slow, fat-finger-prone, no audit trail. Write a one-off Bash script with AWS CLI calls, faster, but every safety check I want to add is another hour. Build a proper internal tool, justified for a team running this monthly, hard to justify for a one-time job.&lt;/p&gt;

&lt;p&gt;The 3D-printer-for-tools answer is option four: build the proper tool anyway, because building it is no longer expensive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spec first, then vibe
&lt;/h3&gt;

&lt;p&gt;This is the part people get wrong about vibe coding. Because the model writes fast, vague specs produce a lot of confidently wrong code very quickly. The discipline shifts from typing to specifying.&lt;/p&gt;

&lt;p&gt;The dialogue that produced this tool started with the problem, and implementation followed. I described what I was trying to do: move a hosted zone safely between accounts, with the registrar transfer as a separate concern. The conversation forced a series of decisions before any code existed.&lt;/p&gt;

&lt;p&gt;Scope: public hosted zones only. Private zones are moved to v2 because cross-account VPC association is a different problem with different failure modes, and conflating them in v1 dilutes the design.&lt;/p&gt;

&lt;p&gt;Trust model: never mutate anything until the operator has confirmed which account they're talking to. STS GetCallerIdentity runs on both source and destination credentials at startup; the account IDs and caller ARNs are shown in plain text, and the operator confirms before the tool proceeds.&lt;/p&gt;

&lt;p&gt;Credential surface: named AWS profiles and environment variables, nothing else. No baked-in keys, no custom config files. The credential chain is the SDK's; the tool just picks where to source from.&lt;/p&gt;

&lt;p&gt;Reversibility: the tool stops short of the irreversible step. It replicates the zone, records it, then prints the new name servers and stops. Updating the registrar's NS delegation is a manual final step, deliberately, because that's the cutover moment, and a human should be the one who pulls that lever.&lt;/p&gt;

&lt;p&gt;Failure modes: which records get skipped (apex NS, SOA), what batch size to use (100, not 1000, clearer error messages outweigh the marginal call count), how pagination is handled (full marker-based loops, not "first page is probably fine").&lt;/p&gt;

&lt;p&gt;These decisions were made in prose, before any TypeScript existed. The model is excellent at translating that prose into code; it is much less reliable at making these decisions for you. Spec-driven vibe coding means the operator writes the spec, the model writes the code, and the operator reviews both for fidelity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Route53 Migration Tool
&lt;/h3&gt;

&lt;p&gt;The result is a CLI called &lt;a href="https://github.com/aletheia/route53-aws-to-aws-transfer" rel="noopener noreferrer"&gt;route53-aws-to-aws-transfer&lt;/a&gt;, written in TypeScript with the AWS SDK v3. The structure mirrors the spec. A credentials module resolves either a named profile or environment variables and runs an STS identity check, returning the validated account ID and caller ARN so the CLI layer can show them to the operator for confirmation. Two independent credential resolutions happen, one for the source account, one for the destination, because conflating them is the most likely operator error and the easiest to prevent at the boundary.&lt;/p&gt;

&lt;p&gt;A Route53 module wraps the SDK calls the migration actually needs: paginated zone listing filtered to public zones, paginated record-set listing with the three-field cursor that the API requires and the SDK doesn't abstract, zone creation with a unique caller reference, and the change-set builder that explicitly drops apex NS and SOA records before batching the rest into UPSERT calls.&lt;/p&gt;

&lt;p&gt;An orchestration module sequences these against the operator's confirmed inputs and emits structured progress. The CLI layer uses @inquirer/prompts for the interactive flow, chalk for the highlighting that draws the eye to account IDs and name-server lists, and ora for the spinners that make long pagination loops feel like progress rather than a hang.&lt;/p&gt;

&lt;p&gt;The whole tool is around 400 lines of TypeScript. It does one thing. It does it with the safety rails I'd expect from an internal platform team's tooling. It will probably run three times in its life, and that's fine, because the cost of building it correctly was lower than the cost of running the migration carelessly even once.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this means if you're running engineering
&lt;/h3&gt;

&lt;p&gt;The economic shift this represents is small in the aggregate but large in the aggregate. The list of things that were previously "not worth building properly" is enormous: data migration scripts, one-off ETL jobs, internal admin CLIs, environment-bootstrap tools, audit-report generators, ad-hoc dashboards, throwaway integrations between two SaaS products you happen to use. Every team has dozens. Most are currently either absent, and the work is being done by hand, or present in a form that's basically a liability: Bash, no tests, no logging, run from someone's laptop.&lt;/p&gt;

&lt;p&gt;If you're a CTO or a tech lead, the practical question is what quality bar you hold for built-once tooling. My answer for my own team is the same as our production bar, minus the scale concerns. STS validation, structured error handling, idempotency where the underlying API allows it, and no silent failures. The model can hold that bar if you specify it. It absolutely won't hold if you don't.&lt;/p&gt;

&lt;p&gt;This is also where vibe coding stops being a private hobby and starts being a team practice. The artifacts are small enough to review properly. The specs are short enough to write down. The economics work even for tooling a single engineer will use once, which means there's no longer an excuse for the un-toolable middle. That category just collapsed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where to go from here?
&lt;/h3&gt;

&lt;p&gt;What desktop 3D printing did for the workshop, vibe coding is doing for software. Production engineering is still production engineering, with all the disciplines that it requires. What returns is a layer we lost when software industrialized: the ability to make exactly the tool you need, for exactly the job in front of you, at a quality level you would have respected even from a professional. Vibe-code the Route 53 migration tool you needed this morning. The alternative is doing the migration without it. The workshop is back. Print your tools.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>How I Monitor AI Agents: CloudWatch for Infra, Arize Phoenix for Traces and OpenTelemetry, LLM-as-Judge for Quality</title>
      <dc:creator>Carlos Cortez 🇵🇪 [AWS Hero]</dc:creator>
      <pubDate>Thu, 14 May 2026 00:56:26 +0000</pubDate>
      <link>https://dev.to/aws-heroes/how-i-monitor-ai-agents-cloudwatch-for-infra-arize-phoenix-for-traces-and-opentelemetry-4iam</link>
      <guid>https://dev.to/aws-heroes/how-i-monitor-ai-agents-cloudwatch-for-infra-arize-phoenix-for-traces-and-opentelemetry-4iam</guid>
      <description>&lt;h1&gt;
  
  
  How I Monitor My AI Agents: CloudWatch for Infra, Arize Phoenix for Traces, LLM-as-Judge for Quality
&lt;/h1&gt;

&lt;p&gt;AI agents are not regular software. They reason, they call tools, they make decisions — and they can fail in ways that a simple health check will never catch. The response was technically successful, but was it actually helpful? The agent called the right tool, but did it interpret the result correctly? Traditional monitoring doesn't answer these questions.&lt;/p&gt;

&lt;p&gt;That's why I built a three-layer observability stack for my AI agents, and today I'm walking you through exactly how it works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwly8qpmqjant0349okd6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwly8qpmqjant0349okd6.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📓 &lt;strong&gt;Full working notebook&lt;/strong&gt;: All the code in this post is validated and executable in the companion &lt;a href="https://github.com/breakingthecloud/observability-ai-agents-phoenix-otel-strands/blob/main/observability-ai-agents.ipynb" rel="noopener noreferrer"&gt;Jupyter notebook&lt;/a&gt; — including setup, tracing, evals, and cleanup. here as well: &lt;a href="https://github.com/breakingthecloud/observability-ai-agents-phoenix-otel-strands" rel="noopener noreferrer"&gt;https://github.com/breakingthecloud/observability-ai-agents-phoenix-otel-strands&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem with Monitoring AI Agents
&lt;/h2&gt;

&lt;p&gt;Here's the thing: when your agent answers "I don't have weather data for Paris" — is that a failure? Technically no, the agent ran fine. But from a user perspective, it's a miss. Traditional monitoring would show 200 OK, low latency, zero errors. Everything looks green. But the user didn't get what they needed.&lt;/p&gt;

&lt;p&gt;You need three layers of observability to actually understand what's happening:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Query → Strands Agent → Tool Calls → Bedrock (Claude)
     ↓              ↓              ↓
  Phoenix      CloudWatch      Phoenix Evals
 (AI traces)  (infra metrics)  (quality scores)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it answers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Traces&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Arize Phoenix&lt;/td&gt;
&lt;td&gt;What did the agent think? Which tools did it call? What was the full LLM input/output?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon CloudWatch&lt;/td&gt;
&lt;td&gt;Is the system healthy? How fast? How much is it costing me?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Quality Evals&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Phoenix + LLM-as-Judge&lt;/td&gt;
&lt;td&gt;Was the response actually good? Helpful? Accurate?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strands Agents SDK&lt;/strong&gt; — AWS's open-source framework for building agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; — Claude Sonnet 4.6 as the foundation model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arize Phoenix&lt;/strong&gt; — Open-source AI observability, runs locally, zero accounts needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon CloudWatch&lt;/strong&gt; — Metrics, alarms, dashboards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenTelemetry&lt;/strong&gt; — The glue that connects everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What's interesting is that Phoenix runs entirely on your machine — &lt;code&gt;localhost:6006&lt;/code&gt;. No cloud accounts, no API keys for the observability layer. You get a full tracing UI for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: AI Traces with Arize Phoenix
&lt;/h2&gt;

&lt;p&gt;The first thing you need is visibility into what your agent is actually doing. Not just "it responded in 2 seconds" but the full reasoning chain: what the LLM received, what it decided, which tools it called, and what it returned.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up the Tracing Pipeline
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6qn2jkeb5nyy1pxh8zr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6qn2jkeb5nyy1pxh8zr.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three steps: launch Phoenix, configure OpenTelemetry, instrument Bedrock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;phoenix&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;trace_api&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;trace_sdk&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.sdk.trace.export&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SimpleSpanProcessor&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry.exporter.otlp.proto.http.trace_exporter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OTLPSpanExporter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openinference.instrumentation.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockInstrumentor&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Launch Phoenix locally
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;px&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch_app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# UI at http://localhost:6006
&lt;/span&gt;
&lt;span class="c1"&gt;# 2. Configure OTel to send traces to Phoenix
&lt;/span&gt;&lt;span class="n"&gt;tracer_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace_sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TracerProvider&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tracer_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_span_processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;SimpleSpanProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OTLPSpanExporter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:6006/v1/traces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;trace_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracer_provider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracer_provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Auto-instrument all Bedrock API calls
&lt;/span&gt;&lt;span class="nc"&gt;BedrockInstrumentor&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;instrument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tracer_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tracer_provider&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Every Bedrock call your agent makes is now traced automatically. No decorators on your business logic, no manual span creation. OpenInference handles it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Agent
&lt;/h3&gt;

&lt;p&gt;The agent itself is straightforward with Strands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockModel&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;weather_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lima&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;☀️ 22°C, clear skies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New York&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🌧️ 15°C, rainy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tokyo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⛅ 18°C, partly cloudy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;weather_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weather data not available for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;boto_session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful weather assistant. Use the get_weather tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when you run &lt;code&gt;agent("What's the weather in Lima and Tokyo?")&lt;/code&gt;, Phoenix captures the entire trace tree: the agent span, the LLM calls, the tool invocations, the final response. All visible in the UI at &lt;code&gt;localhost:6006&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exploring Traces Programmatically
&lt;/h3&gt;

&lt;p&gt;You don't have to use the UI. Phoenix exposes everything as DataFrames:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;phoenix.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;

&lt;span class="n"&gt;traces_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;spans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_spans_dataframe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;traces_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;traces_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;traces_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;total_seconds&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total spans captured: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;traces_df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;traces_df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_kind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latency_ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you every span — agent, LLM, tool — with timing, status, and the full input/output attributes. Perfect for building custom analytics or feeding into your own dashboards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Infrastructure Monitoring with CloudWatch
&lt;/h2&gt;

&lt;p&gt;Phoenix tells you what the agent is thinking. CloudWatch tells you if the system is healthy. Different questions, both critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AgentMonitor Class
&lt;/h3&gt;

&lt;p&gt;I built a simple wrapper that publishes four metrics per agent invocation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cloudwatch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudwatch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentMonitor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI/Agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;namespace&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cloudwatch&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;latency_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Milliseconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TokensUsed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ToolCalls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;dims&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dimensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dims&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_metric_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MetricData&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage is clean — wrap your agent call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;monitor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentMonitor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Lima?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
    &lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
    &lt;span class="n"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Smart Alarms
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wt9ufgw53dk0qd4bjpi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5wt9ufgw53dk0qd4bjpi.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two alarms that catch the most common issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Alert when response time is consistently high
&lt;/span&gt;&lt;span class="n"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_metric_alarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AlarmName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent-High-Latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetricName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Latency&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI/Agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EvaluationPeriods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10000.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 10 seconds
&lt;/span&gt;    &lt;span class="n"&gt;ComparisonOperator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GreaterThanThreshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Alert when error rate exceeds 5%
&lt;/span&gt;&lt;span class="n"&gt;cloudwatch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_metric_alarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;AlarmName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent-High-Error-Rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MetricName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI/Agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Statistic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Period&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;EvaluationPeriods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ComparisonOperator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LessThanThreshold&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AgentName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The concept is straightforward: latency catches performance degradation, error rate catches reliability issues. These two alarms alone will catch 80% of production problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 3: LLM-as-Judge Evals
&lt;/h2&gt;

&lt;p&gt;This is the layer most people skip — and it's the most important one. Your agent can be fast, reliable, and still give terrible answers. You need automated quality evaluation.&lt;/p&gt;

&lt;p&gt;The idea: use another LLM to judge the quality of your agent's responses. It's not perfect, but it's infinitely better than no evaluation at all.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynqxpgtmaqlxrgz2m49y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynqxpgtmaqlxrgz2m49y.png" alt=" " width="800" height="520"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up the Evaluator
&lt;/h3&gt;

&lt;p&gt;Phoenix evals v3 uses a provider-based LLM wrapper. For Bedrock, it goes through litellm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;phoenix.evals&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;create_evaluator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;evaluate_dataframe&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;phoenix.client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Client&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="c1"&gt;# Get LLM spans from Phoenix
&lt;/span&gt;&lt;span class="n"&gt;spans_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;spans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_spans_dataframe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;llm_spans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spans_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;spans_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_kind&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Build eval dataframe
&lt;/span&gt;&lt;span class="n"&gt;eval_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;llm_spans&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attributes.input.value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;llm_spans&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attributes.output.value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;eval_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eval_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;eval_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create the judge
&lt;/span&gt;&lt;span class="n"&gt;eval_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@create_evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helpfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;helpfulness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Rate how helpful the agent response is on a scale of 0 to 1.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate the helpfulness of this AI response on a scale of 0.0 to 1.0.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asked: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AI responded: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return ONLY a number between 0.0 and 1.0.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;eval_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;

&lt;span class="c1"&gt;# Run evaluation
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;eval_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;evaluators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;helpfulness&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cool part here is the &lt;code&gt;@create_evaluator&lt;/code&gt; decorator — it turns a simple function into a full evaluator that Phoenix understands. You can create as many as you need: helpfulness, accuracy, safety, tone, whatever matters for your use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pushing Scores Back to Phoenix
&lt;/h3&gt;

&lt;p&gt;The evaluation results are useful in a DataFrame, but they're even more useful when attached to the actual traces in Phoenix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;_json&lt;/span&gt;

&lt;span class="n"&gt;score_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;score_col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;_json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;annotations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;llm_spans&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context.span_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;good&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;needs_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;explanation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Helpfulness score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;spans&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log_span_annotations_dataframe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;dataframe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;annotations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;annotation_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helpfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;annotator_kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LLM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when you open Phoenix UI and click on any LLM span, you see the helpfulness score right there in the Annotations tab. Traces + quality scores in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Costs
&lt;/h2&gt;

&lt;p&gt;One question I always get: what does this cost to run?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Where&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Phoenix&lt;/td&gt;
&lt;td&gt;Your machine (localhost)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock (agent calls)&lt;/td&gt;
&lt;td&gt;AWS, pay-per-request&lt;/td&gt;
&lt;td&gt;~$0.003 per query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock (eval judge)&lt;/td&gt;
&lt;td&gt;AWS, pay-per-request&lt;/td&gt;
&lt;td&gt;~$0.003 per eval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch alarms&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;~$0.20/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch custom metrics&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;~$0.30/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For development and testing, you're looking at less than $1/month for the AWS side. Phoenix is completely free and local.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Main Takeaway
&lt;/h2&gt;

&lt;p&gt;Observability for AI agents requires thinking in three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Traces&lt;/strong&gt; (Phoenix) — What is the agent doing? What's the full reasoning chain?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infra metrics&lt;/strong&gt; (CloudWatch) — Is the system healthy? Fast? Within budget?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality evals&lt;/strong&gt; (LLM-as-Judge) — Are the responses actually good?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most teams only do layer 2. Some add layer 1. Almost nobody does layer 3 — and that's where the real insights are. A fast, reliable agent that gives bad answers is worse than a slow one that gives good answers, because you won't even know there's a problem.&lt;/p&gt;

&lt;p&gt;My advice: start with Phoenix traces (it's free and local), add CloudWatch for the basics (latency, errors, tokens), and then build at least one LLM-as-Judge evaluator for whatever quality dimension matters most to your users. You can set this up in an afternoon and it will save you weeks of debugging blind.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Connect with me:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/carloscortezcloud" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; - Let's discuss AI observability and agent architectures&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/ccortezb" rel="noopener noreferrer"&gt;X/Twitter&lt;/a&gt; - Follow for AWS, GenAI, and agentic AI updates&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ccortezb" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; - Check out the full notebook and more&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/ccortezb"&gt;Dev.to&lt;/a&gt; - More technical deep-dives&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://builder.aws.com/community/@breakinthecloud" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt; - Join the conversation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm Carlos Cortez, this is &lt;em&gt;Breaking the Cloud&lt;/em&gt;, and today we made our agents observable. See you in the next one!&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Building AI Agents with Spring AI and Amazon Bedrock AgentCore - Part 3 Develop local MCP client for Conference application</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Mon, 11 May 2026 15:03:58 +0000</pubDate>
      <link>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-3-develop-local-mcp-client-560a</link>
      <guid>https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-3-develop-local-mcp-client-560a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;, we explained how to deploy and run our conference search application on the Amazon Bedrock AgentCore Runtime as the MCP server. In this article, we'll develop the (MCP-) client, capable of talking to our application running on AgentCore Runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Develop local MCP client for Conference application
&lt;/h2&gt;

&lt;p&gt;You can find the source code of the MCP client in my &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/tree/main/spring-ai-1.1-conference-app-agent-local" rel="noopener noreferrer"&gt;spring-ai-1.1-conference-app-agent-local&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;Let's go step-by-step through it.&lt;/p&gt;

&lt;p&gt;First, in &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/pom.xml" rel="noopener noreferrer"&gt;pom.xml&lt;/a&gt;, we include,  among others, those dependencies: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;spring-ai-bom - to include the general Spring AI functionality.&lt;/li&gt;
&lt;li&gt;spring-boot-starter-web - as we develop the MCP client as a web application.&lt;/li&gt;
&lt;li&gt;spring-ai-starter-model-bedrock-converse -as we use foundational models on Amazon Bedrock.&lt;/li&gt;
&lt;li&gt;spring-ai-starter-mcp-client-webflux - to develop an &lt;a href="https://docs.spring.io/spring-ai/reference/api/mcp/mcp-client-boot-starter-docs.html" rel="noopener noreferrer"&gt;asynchronous Spring AI MCP Client&lt;/a&gt;. We can use spring-ai-starter-mcp-client to develop a synchronous one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/java/dev/vkazulkin/SpringAIConferenceLocalMCPClient.java" rel="noopener noreferrer"&gt;SpringAIConferenceLocalMCPClient&lt;/a&gt; class is the main entry point to our application.&lt;/p&gt;

&lt;p&gt;Second, in &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/resources/application.properties" rel="noopener noreferrer"&gt;application.properties&lt;/a&gt;, we define some properties. Those are Spring AI-related:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;spring.ai.bedrock.aws.region&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.bedrock.aws.timeout&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10m&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.bedrock.converse.chat.options.max-tokens&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.bedrock.converse.chat.options.model&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-lite-v1:0&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.mcp.client.type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;ASYNC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We define the region where we host our application, and the timeout when talking to the Amazon Bedrock models. Then we also set the default Amazon Bedrock to use and a maximal number of tokens, and the MCP client type to ASYNC. We can also set SYNC instead, but we need to use another Spring AI MCP client dependency as described above.&lt;/p&gt;

&lt;p&gt;We also include some application-related properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;cognito.user.pool.name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;UserPoolForAgentCoreMCP&lt;/span&gt;
&lt;span class="py"&gt;cognito.user.pool.client.name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;UserPoolClientWithUserAndPasswordForAgentCoreMCP&lt;/span&gt;
&lt;span class="py"&gt;cognito.auth.token.resource.server.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;AgentCoreResourceServerId&lt;/span&gt;
&lt;span class="py"&gt;amazon.bedrock.agentcore.runtime.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;spring_ai_conference_search_agentcore_runtime-6dnMIL9455&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are individual properties, whose values we need to set from the deployment of the Conference search MCP server. We described the configuration, creation process, and those properties of the MCP server in &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Please ignore other properties like &lt;em&gt;amazon.bedrock.agentcore.gateway.url&lt;/em&gt; as we will need them when we extend our application in the next articles.&lt;/p&gt;

&lt;p&gt;The whole application logic is in the &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/java/dev/vkazulkin/agent/controller/SpringAIAgentController.java" rel="noopener noreferrer"&gt;SpringAIAgentController&lt;/a&gt; class. &lt;/p&gt;

&lt;p&gt;We inject the values of individual properties and build AWS service clients (STS and Cognito). This is how we create the ChatClient, which is the main interface of Spring AI to talk to the LLMs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;SpringAIAgentController&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ChatMemory&lt;/span&gt; &lt;span class="n"&gt;chatMemory&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nd"&gt;@Value&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${aws.region}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;awsRegion&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
   &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ToolCallingChatOptions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"us.anthropic.claude-sonnet-4-6"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultOptions&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We show here that we can optionally build &lt;em&gt;ToolCallingChatOptions&lt;/em&gt; and override the default model name and the maximum number of tokens defined in application.properties. Then, we build the &lt;em&gt;ChatClient&lt;/em&gt;, and can optionally set &lt;em&gt;ToolCallingChatOptions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Below is how the code for the method looks, which will receive the prompt from the user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@GetMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"/conference"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;consumes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"text/plain"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Flux&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;conferenceSearch&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestParam&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getAuthTokenViaHttpClient&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;async&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getMcpClientTransport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;initialize&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;toolsResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;listTools&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; 
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;toolsResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;block&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tool found "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncMcpToolCallbackProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mcpClients&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;


  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeTools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break this code down and explain it. First, we need to obtain the JWT token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getAuthTokenViaHttpClient&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This, in turn, uses a bunch of Amazon Cognito services to achieve this goal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getAuthTokenViaHttpClient&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUserPool&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPoolClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;getUserPoolClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPool&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPoolClientType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;describeUserPoolClient&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userPoolClient&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;userPoolId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userPool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;userPoolId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userPoolId&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"_"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;toLowerCase&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;userPoolId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".auth."&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nc"&gt;Region&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;US_EAST_1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;".amazoncognito.com/oauth2/token"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="no"&gt;SCOPE_STRING&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;RESOURCE_SERVER_ID&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/*"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;entity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"grant_type=client_credentials&amp;amp;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"client_id="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;userPoolClientType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;clientId&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&amp;amp;"&lt;/span&gt;
    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"client_secret="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;userPoolClientType&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;clientSecret&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"&amp;amp;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"scope="&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="no"&gt;SCOPE_STRING&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;httpClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HttpClients&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;createDefault&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
     &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;httpPost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ClassicRequestBuilder&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
       &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/x-www-form-urlencoded"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;setEntity&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entity&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;httpPost&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AuthTokenResponseHandler&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;       
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we use the configuration of the user (client ) names and the resource server ID from &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/resources/application.properties" rel="noopener noreferrer"&gt;application.properties&lt;/a&gt; to obtain the user (client) pool. Then we construct the URL and the body (entity) of the HTTP request to obtain the authentication token. After it, we execute this request and obtain the token from the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AuthTokenResponseHandler&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;HttpClientResponseHandler&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="nd"&gt;@Override&lt;/span&gt;
  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;handleResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ClassicHttpResponse&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;HttpException&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;IOException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;inputStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEntity&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;getContent&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;responseString&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputStream&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readAllBytes&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;StandardCharsets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;UTF_8&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;responseMap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;objectMapper&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;readValue&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;responseString&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TypeReference&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;()&lt;/span&gt; &lt;span class="o"&gt;{});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;responseMap&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"access_token"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
   &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After we have obtained the token, we're ready to create the (asynchronous as configured) MCP client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;async&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;getMcpClientTransport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's describe what happens when we invoke the &lt;em&gt;getMcpClientTransport&lt;/em&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;McpClientTransport&lt;/span&gt; &lt;span class="nf"&gt;getMcpClientTransport&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;        
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="no"&gt;MCP_SERVER_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMCPServerEndpoint&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;headerValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Bearer "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;webClientBuilder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WebClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Authorization"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headerValue&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"accept"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"application/json, text/event-stream"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
     &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;WebClientStreamableHttpTransport&lt;/span&gt;
       &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;webClientBuilder&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
       &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;MCP_SERVER_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We first construct the &lt;em&gt;MCP_SERVER_ENDPOINT&lt;/em&gt; URL from the in application.properties configured AgentCore Runtime ID. In the next article, I'll add the use case to also add the AgentCore Gateway URL. Then, we create the &lt;em&gt;WebClientBuilder&lt;/em&gt; by passing some HTTP headers, including the bearer token. After it, we create &lt;em&gt;WebClientStreamableHttpTransport&lt;/em&gt; and set the web client builder and the MCP server endpoint. It's important to use the HTTP Streamable web client because AgentCore Runtime (and Gateway) only supports it.&lt;/p&gt;

&lt;p&gt;Now we are ready to initialize our MCP client and obtain the list of tools from it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;initialize&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;toolsResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;listTools&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;toolsResult&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;block&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"tool found "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We get all 4 tools that our Conference Search application from &lt;a href="https://dev.to/aws-heroes/building-ai-agents-with-spring-ai-and-amazon-bedrock-agentcore-part-2-deploy-conference-search-2bo8"&gt;part 2&lt;/a&gt; exposes, which we deployed on AgentCore Runtime.&lt;/p&gt;

&lt;p&gt;Next, we need to create the list of tool callbacks from the MCP Client to pass to the &lt;em&gt;ChatClient&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncMcpToolCallbackProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;mcpClients&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
     &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't need all the tools, you can filter them and, for example, only leave those tools whose name contains &lt;em&gt;Conference_Search_Tool_By_Topic&lt;/em&gt; as a substring, as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; 
&lt;span class="nc"&gt;AsyncMcpToolCallbackProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;mcpClients&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
  &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolFilter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpToolFilter&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;              
       &lt;span class="nd"&gt;@Override&lt;/span&gt; &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;McpConnectionInfo&lt;/span&gt; &lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Tool&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toLowerCase&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;contains&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Conference_Search_Tool_By_Topic"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
             &lt;span class="o"&gt;}&lt;/span&gt; 
         &lt;span class="o"&gt;}&lt;/span&gt;
      &lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, to enable search prompts such as "Please provide me with the list of conferences including their IDs, with Java topic happening in 2027, with call for papers open today", we need to obtain the current date. LLM doesn't know the current date, and for this, I wrote a small tool with the name &lt;a href="https://github.com/Vadym79/amazon-bedrock-agentcore-spring-ai/blob/main/spring-ai-1.1-conference-app-agent-local/src/main/java/dev/vkazulkin/agent/tools/DateTimeTools.java" rel="noopener noreferrer"&gt;DateTimeTools&lt;/a&gt; :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Tool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Get the current date "&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;getLocalDate&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;LocalDate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;now&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toString&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;       
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It contains only one tool to get the current date. Then, we pass this local tool to the &lt;em&gt;ChatClient&lt;/em&gt; by invoking the &lt;em&gt;tools&lt;/em&gt; method. We also pass the tool callback list from the &lt;em&gt;AsyncMcpToolCallbackProvider&lt;/em&gt; by invoking the &lt;em&gt;toolCallbacks&lt;/em&gt; method. The last step is to use the &lt;em&gt;ChatClient&lt;/em&gt; with the given prompt and tool (callbacks) to produce an answer to the prompt. This answer will be streamed back to the user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DateTimeTools&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toolCallbacks&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;asyncMcpToolCallbackProvider&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getToolCallbacks&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
   &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's build our application with &lt;code&gt;mvn clean package&lt;/code&gt; and start it with: &lt;code&gt;mvn spring-boot:run&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now we can use CURL or &lt;a href="https://httpie.io/docs/cli/installation" rel="noopener noreferrer"&gt;HTTPie&lt;/a&gt; to send some prompts. For example:&lt;/p&gt;

&lt;p&gt;"Please provide me with the list of conferences, including their IDs, with Java topics happening in 2027".&lt;/p&gt;

&lt;p&gt;Here is an example of the request with HTTPie:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;http GET http://localhost:8080/conference?prompt="Please provide me with the list of conferences, including their IDs, with Java topics happening in 2027" Content-Type:text/plain&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is the correct LLM response: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg21z0unwb6bsrctb6nl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg21z0unwb6bsrctb6nl.png" alt=" " width="800" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see from the description and logs, LLM used the tool &lt;em&gt;Conference_Search_Tool_By_Topic_And_Date&lt;/em&gt; from the MCP server to produce the answer. Let's try another prompt:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;http GET http://localhost:8080/conference?prompt="Please provide me with the list of conferences, including their IDs, with Java topics happening in 2026 and 2027, with the call for papers open today" Content-Type:text/plain&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is the correct LLM response again: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2obqjj9bxe7dya6lt2qq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2obqjj9bxe7dya6lt2qq.png" alt=" " width="800" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see from the description and logs, LLM used the tools to produce the answer. &lt;em&gt;Conference_Search_Tool_By_Topic_Date_CFP_Open&lt;/em&gt; from the MCP server and the local tool &lt;em&gt;Get_The_Current_Date&lt;/em&gt; to produce the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we developed the (MCP-) client, capable of talking to our application running on AgentCore Runtime. In the next article, we'll look at another alternative to AgentCore Runtime to host MCP servers on AgentCore - AgentCore Gateway. We'll also compare both alternatives. In one of the next articles, I'll show you how to deploy and run this MCP client on the AgentCore Runtime as well, using the HTTP protocol. It's not always appropriate to work with the client locally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like my content, please follow me on &lt;a href="https://github.com/Vadym79" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give my repositories a star!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also check out my &lt;a href="https://vkazulkin.com" rel="noopener noreferrer"&gt;website&lt;/a&gt; for more technical content and upcoming public speaking activities.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>springai</category>
      <category>bedrockagentcore</category>
    </item>
  </channel>
</rss>
