<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AWS Heroes</title>
    <description>The latest articles on DEV Community by AWS Heroes (@aws-heroes).</description>
    <link>https://dev.to/aws-heroes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2491%2Ff0c1a659-c959-42cd-bb12-cd25909dd9db.png</url>
      <title>DEV Community: AWS Heroes</title>
      <link>https://dev.to/aws-heroes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aws-heroes"/>
    <language>en</language>
    <item>
      <title>Understanding AWS Blocks as a CDK Developer</title>
      <dc:creator>Kenta Goto</dc:creator>
      <pubDate>Wed, 17 Jun 2026 10:45:55 +0000</pubDate>
      <link>https://dev.to/aws-heroes/understanding-aws-blocks-as-a-cdk-developer-5gpo</link>
      <guid>https://dev.to/aws-heroes/understanding-aws-blocks-as-a-cdk-developer-5gpo</guid>
      <description>&lt;h2&gt;
  
  
  Understanding AWS Blocks as a CDK Developer
&lt;/h2&gt;

&lt;p&gt;I tried out &lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;AWS Blocks&lt;/a&gt;, announced as a preview on June 17, 2026, from the perspective of someone who normally writes AWS CDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is AWS Blocks?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;AWS Blocks&lt;/a&gt; is a toolkit for quickly building the backend of a full-stack app. It was announced as a preview on June 17, 2026, and its source code is published as open source at &lt;a href="https://github.com/aws-devtools-labs/aws-blocks" rel="noopener noreferrer"&gt;aws-devtools-labs/aws-blocks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At its center is a component called a &lt;strong&gt;Building Block&lt;/strong&gt;. It is a unit of functionality such as a data store or authentication, and a single Building Block plays the following three roles at the same time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A CDK Construct&lt;/strong&gt; (the infrastructure definition at deploy time)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A runtime implementation&lt;/strong&gt; (AWS SDK calls running on Lambda)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A local mock&lt;/strong&gt; (runs with &lt;code&gt;npm run dev&lt;/code&gt;, no AWS environment needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the same Building Block description acts as a mock during local development, as a CDK Construct at deploy time, and as AWS SDK calls in production. As a result, a single piece of code can run locally on your machine without an AWS account, and can be deployed to AWS as is.&lt;/p&gt;

&lt;p&gt;The Building Blocks provided include &lt;code&gt;KVStore&lt;/code&gt; and &lt;code&gt;DistributedTable&lt;/code&gt; (data), &lt;code&gt;AuthBasic&lt;/code&gt; (authentication), &lt;code&gt;Realtime&lt;/code&gt; (real-time communication), &lt;code&gt;FileBucket&lt;/code&gt; (files), and &lt;code&gt;Database&lt;/code&gt; (SQL).&lt;/p&gt;

&lt;p&gt;Developers write both the infrastructure (such as data) and the API logic that uses it together in &lt;code&gt;aws-blocks/index.ts&lt;/code&gt; (which can also be split into multiple files).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// aws-blocks/index.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;KVStore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-blocks/blocks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-app&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KVStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cache&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ← at deploy time, this becomes a DynamoDB table (infrastructure)&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;getValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// ← API logic that uses that table&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The infrastructure definition &lt;code&gt;new KVStore(...)&lt;/code&gt; and the API logic that calls it sit side by side in the same file. If CDK is about "declaring infrastructure as code (Infrastructure as Code)," then AWS Blocks is framed as "deriving infrastructure from application code (Infrastructure from Code, IfC)."&lt;/p&gt;

&lt;p&gt;The frontend simply calls this &lt;code&gt;api&lt;/code&gt; directly, like &lt;code&gt;import { api } from 'aws-blocks'&lt;/code&gt;, so that &lt;strong&gt;the frontend and backend share types without any code-generation step in between&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For type integration, type-safe clients are provided not only for web frameworks such as Next, Nuxt, React, and Vue, but also for native mobile targets such as Swift, Kotlin, and Dart/Flutter. In TypeScript you can load them directly via &lt;code&gt;import&lt;/code&gt;, whereas for native targets the client code is generated at build time from a spec called &lt;code&gt;blocks.spec.json&lt;/code&gt;, and the backend is called over JSON-RPC.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Same import Resolves to Different Files Depending on Context
&lt;/h3&gt;

&lt;p&gt;This is probably the most distinctive feature of AWS Blocks.&lt;/p&gt;

&lt;p&gt;Both the &lt;code&gt;import { api } from 'aws-blocks'&lt;/code&gt; you write on the frontend and the &lt;code&gt;new DistributedTable(...)&lt;/code&gt; you call inside the backend's &lt;code&gt;aws-blocks/index.ts&lt;/code&gt; look like ordinary TypeScript. Yet this &lt;code&gt;'aws-blocks'&lt;/code&gt; (and each Building Block) &lt;strong&gt;resolves to a completely different file depending on the context in which it runs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This "resolves to something different depending on context" behavior is stated explicitly in the official README. It explains that the same &lt;code&gt;new KVStore(scope, 'todos')&lt;/code&gt; becomes a local store during development, an Amazon DynamoDB table at deploy time, and an SDK call in production (without changing a single line of code). Below, I'll trace how this works from the implementation.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Resolves to&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local development (&lt;code&gt;npm run dev&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Mock implementation&lt;/td&gt;
&lt;td&gt;Stores to disk (&lt;code&gt;.bb-data/&lt;/code&gt;). No AWS environment needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDK synth&lt;/td&gt;
&lt;td&gt;Infrastructure (CDK Construct)&lt;/td&gt;
&lt;td&gt;Defines DynamoDB tables, Lambdas, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda runtime&lt;/td&gt;
&lt;td&gt;Application implementation&lt;/td&gt;
&lt;td&gt;Calls the AWS SDK&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Mechanism: conditional exports + a global variable
&lt;/h4&gt;

&lt;p&gt;The mechanism consists of two elements.&lt;/p&gt;

&lt;p&gt;The first is Node.js conditional exports (the &lt;code&gt;exports&lt;/code&gt; field in &lt;code&gt;package.json&lt;/code&gt;). Looking at the &lt;code&gt;package.json&lt;/code&gt; of each Building Block, &lt;code&gt;exports&lt;/code&gt; is defined as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="c1"&gt;// package.json of @aws-blocks/bb-distributed-table (exports excerpt)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"exports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"."&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"browser"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.browser.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cdk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"types"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.cdk.d.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.cdk.js"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"aws-runtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.aws.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"types"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.mock.d.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"./dist/index.mock.js"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;exports&lt;/code&gt; is a mechanism that assigns a different file per condition; when an import is resolved, the file matching an active condition is loaded. &lt;code&gt;browser&lt;/code&gt; / &lt;code&gt;types&lt;/code&gt; / &lt;code&gt;default&lt;/code&gt; are standard Node.js conditions, whereas &lt;code&gt;cdk&lt;/code&gt; and &lt;code&gt;aws-runtime&lt;/code&gt; are conditions that AWS Blocks defines on its own, and they are not active unless explicitly specified at startup.&lt;/p&gt;

&lt;p&gt;And the reason the resolution target changes per context is that each command passes "which condition to activate" at startup.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cdk synth&lt;/code&gt; runs with &lt;code&gt;NODE_OPTIONS=--conditions=cdk&lt;/code&gt;, so it resolves to &lt;code&gt;cdk&lt;/code&gt; (= Construct); the Lambda bundle passes &lt;code&gt;--conditions: aws-runtime&lt;/code&gt; to esbuild, so it resolves to &lt;code&gt;aws-runtime&lt;/code&gt; (= runtime). &lt;code&gt;npm run dev&lt;/code&gt; specifies no condition, so it resolves to &lt;code&gt;default&lt;/code&gt; (= mock), and for type checking TypeScript always looks at &lt;code&gt;types&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The second is the hand-off at synth time. Before dynamically importing the &lt;code&gt;aws-blocks/index.ts&lt;/code&gt; that the developer implemented, the CDK layer holds a reference to the stack instance currently being synthesized in a global variable. The &lt;code&gt;Scope&lt;/code&gt; and each Building Block created inside &lt;code&gt;index.ts&lt;/code&gt; are not given a parent stack explicitly, so they refer to this global variable to resolve which stack they belong to.&lt;/p&gt;

&lt;p&gt;NOTE: &lt;a href="https://github.com/aws-devtools-labs/aws-blocks/blob/fc099a173eb39e97ebe7290409abf8eb18f3b1ce/packages/core/src/cdk/blocks-backend.ts#L211-L248" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/cdk/blocks-backend.ts&lt;/code&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/core/src/cdk/blocks-backend.ts (excerpt; the infrastructure-building parts are omitted)&lt;/span&gt;
&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BlocksBackendProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// Expose self to Building Blocks at CDK time&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;CURRENT_BLOCKS_STACK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// ← expose the current stack globally&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BlocksBackendProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;assertCdkConditionActive&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BlocksBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backendCDKPath&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;?stack=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ← load index.ts&lt;/span&gt;
  &lt;span class="c1"&gt;// ...&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In short, when &lt;code&gt;new DistributedTable(...)&lt;/code&gt; runs inside the &lt;code&gt;index.ts&lt;/code&gt; the developer implemented, that Construct gets the current stack from this global variable, creates a DynamoDB table, and grants the stack's Lambda read/write permissions on the table. This is how &lt;strong&gt;writing application code itself becomes the infrastructure definition&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up a Project
&lt;/h3&gt;

&lt;p&gt;An AWS Blocks app starts from npm's create command (Node.js 22 or later / npm 10 or later is required).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create @aws-blocks/blocks-app@latest my-app
&lt;span class="nb"&gt;cd &lt;/span&gt;my-app
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;npm run dev&lt;/code&gt; starts a local server at &lt;code&gt;http://localhost:3000&lt;/code&gt;, and all Building Blocks run as mock implementations (no AWS account or credentials needed). You can also choose a template, like &lt;code&gt;--template react&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The generated directory looks roughly like this (the frontend contents vary by template, but the structure of &lt;code&gt;aws-blocks/&lt;/code&gt; is the same).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-app/
├── aws-blocks/            # the entire backend
│   ├── index.ts           #   backend definition (API, data, auth) ← what developers mainly write
│   ├── index.cdk.ts       #   CDK entry (assembles the stack with BlocksStack.create)
│   ├── index.handler.ts   #   entry for the Lambda handler at deploy time
│   ├── client.js          #   typed client for the frontend (auto-generated)
│   └── scripts/           #   helper scripts such as the dev server
├── src/                   # frontend (React / Next, etc., depending on the template)
├── cdk.json
└── package.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;aws-blocks/index.ts&lt;/code&gt; we've seen so far is the body of the backend definition, and &lt;code&gt;aws-blocks/index.cdk.ts&lt;/code&gt; is the entry point as CDK.&lt;/p&gt;

&lt;p&gt;Basically, what developers implement is &lt;code&gt;aws-blocks/index.ts&lt;/code&gt;, and there's no need to touch &lt;code&gt;aws-blocks/index.cdk.ts&lt;/code&gt;. However, as described later, you can also customize the CDK configuration by touching this file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing a Sample
&lt;/h3&gt;

&lt;p&gt;Here's what it looks like in practice. Using &lt;code&gt;AuthBasic&lt;/code&gt;, &lt;code&gt;DistributedTable&lt;/code&gt;, and &lt;code&gt;ApiNamespace&lt;/code&gt;, I've implemented a simple Todo app.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// aws-blocks/index.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;AuthBasic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;DistributedTable&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-blocks/blocks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;todo-app&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AuthBasic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;auth&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;passwordPolicy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;minLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authApi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createApi&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// The Zod schema serves as runtime validation + the TypeScript type + the DynamoDB table shape, all at once&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;todoSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;    &lt;span class="c1"&gt;// partition key (isolated per user)&lt;/span&gt;
  &lt;span class="na"&gt;todoId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;    &lt;span class="c1"&gt;// sort key&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;todos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DistributedTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;todos&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;todoSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;userId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;todoId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;createTodo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;requireAuth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// ← this makes the method require authentication&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;todo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;username&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;todoId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;36&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;completed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;todo&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;listTodos&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;requireAuth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;todos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;where&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;equals&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;username&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sample implementation shows the following characteristics as well.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authentication is specified explicitly per method:&lt;/strong&gt; each method is a public RPC endpoint with no authentication by default. Call &lt;code&gt;auth.requireAuth(context)&lt;/code&gt; at the beginning of any method you want to require authentication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reusing the Zod schema:&lt;/strong&gt; a single Zod schema can cover runtime validation, the TS type, and the DynamoDB table definition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also, the frontend can call the API definition implemented here in a typed way, like &lt;code&gt;import { api } from 'aws-blocks'&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying
&lt;/h3&gt;

&lt;p&gt;Once it works locally, you can deploy the same code to AWS as is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run sandbox   &lt;span class="c"&gt;# to a per-developer temporary environment (backend on AWS, frontend served locally)&lt;/span&gt;
npm run deploy    &lt;span class="c"&gt;# full production deploy (including frontend hosting)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;npm run sandbox&lt;/code&gt; is a temporary stack for quickly checking behavior, and &lt;code&gt;npm run deploy&lt;/code&gt; is a production deploy that includes frontend hosting. Both run CDK internally (synth → deploy with &lt;code&gt;--conditions=cdk&lt;/code&gt;, plus generating client code with &lt;code&gt;--conditions=aws-runtime&lt;/code&gt;), and the first time, just like ordinary CDK, you need AWS credentials and &lt;code&gt;cdk bootstrap&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Whereas &lt;code&gt;npm run dev&lt;/code&gt; used local mocks to realize the app's behavior, here the app is deployed to a real environment without changing a single line of code. The earlier mechanism where "the same import resolves to different files depending on context" comes in handy here.&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Blocks from a CDK Perspective
&lt;/h3&gt;

&lt;p&gt;A distinctive feature of AWS Blocks is that you can implement with it even without being familiar with AWS or CDK, but as a CDK user you may sometimes want to extend the CDK definition. AWS Blocks supports those use cases too.&lt;/p&gt;

&lt;h4&gt;
  
  
  You Can Extend the CDK Definition
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;aws-blocks/index.cdk.ts&lt;/code&gt; is the CDK layer, where &lt;code&gt;BlocksStack&lt;/code&gt; is invoked. Here you can also define additional resources you want to create and pass them to &lt;code&gt;blocksStack.handler&lt;/code&gt; (the Lambda's &lt;code&gt;NodejsFunction&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// aws-blocks/index.cdk.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;sqs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-sqs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="c1"&gt;// ...&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;blocksStack&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;BlocksStack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-app&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;backendHandlerPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;backendCDKPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Create raw CDK resources in the same stack as Blocks, and pass permissions and environment variables to the Lambda&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;jobsQueue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;JobsQueue&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;jobsQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantSendMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;JOBS_QUEUE_URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;jobsQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;blocksStack.handler&lt;/code&gt; exposes methods like &lt;code&gt;addToRolePolicy&lt;/code&gt; (adding IAM) and &lt;code&gt;addEnvironment&lt;/code&gt; (injecting environment variables), and you can also grant the Lambda IAM permissions on your own CDK resources using &lt;code&gt;grant~&lt;/code&gt; and similar methods.&lt;/p&gt;

&lt;h4&gt;
  
  
  Checking the Generated CloudFormation with cdk synth
&lt;/h4&gt;

&lt;p&gt;Since AWS Blocks is ultimately AWS CDK under the hood, you can of course run synth too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cdk synth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Among the files generated when you set up the project is &lt;code&gt;cdk.json&lt;/code&gt;, whose app command is &lt;code&gt;npx tsx -C cdk aws-blocks/index.cdk.ts&lt;/code&gt;. The point is this &lt;code&gt;-C cdk&lt;/code&gt; (= &lt;code&gt;--conditions=cdk&lt;/code&gt;), which makes each Building Block resolve to a CDK Construct.&lt;/p&gt;

&lt;p&gt;Conversely, if you run it without &lt;code&gt;--conditions=cdk&lt;/code&gt;, the Building Blocks would resolve as mocks, so AWS Blocks stops you with an explicit &lt;code&gt;Missing --conditions=cdk&lt;/code&gt; error.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Missing --conditions=cdk: Building Blocks will silently load mock implementations instead of CDK constructs.

Fix: Set NODE_OPTIONS="--conditions=cdk" before running CDK synth:
  NODE_OPTIONS="--conditions=cdk" npx cdk synth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Integration Patterns
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://github.com/aws-devtools-labs/aws-blocks/blob/fc099a173eb39e97ebe7290409abf8eb18f3b1ce/docs/guides/extending-with-existing-aws-resources.md" rel="noopener noreferrer"&gt;documentation&lt;/a&gt; presents four patterns for integrating with existing infrastructure.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CDK-in-Blocks:&lt;/strong&gt; connect raw CDK resources to the Blocks Lambda and use them (for resources that have no corresponding Building Block)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;fromExisting&lt;/code&gt;:&lt;/strong&gt; wrap already-deployed AWS resources with a Building Block and use them (for resources that have a corresponding Building Block)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Block:&lt;/strong&gt; build your own Building Block&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendorize:&lt;/strong&gt; pull the Block's code into your own repository&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's look at a concrete example. SQS has no corresponding Building Block, so you connect it as raw CDK with CDK-in-Blocks. For instance, if you want to "send a message from the Blocks API to an existing SQS queue created by another stack," you reference the existing queue from &lt;code&gt;index.cdk.ts&lt;/code&gt; and pass permissions and environment variables to the Lambda.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// aws-blocks/index.cdk.ts — reference an existing queue owned by another stack and wire it to the Lambda&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;sqs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-sqs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;externalQueue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromQueueArn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ExternalQueue&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:sqs:ap-northeast-1:123456789012:my-queue&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;externalQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grantSendMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;blocksStack&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEnvironment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;EXTERNAL_QUEUE_URL&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;externalQueue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the other hand, for resources that have a corresponding Building Block (existing DynamoDB tables, S3 buckets, RDS, Cognito, and so on), wrapping them with &lt;code&gt;fromExisting&lt;/code&gt; is a better fit. Just by passing the existing resource name to something like &lt;code&gt;table:&lt;/code&gt; when creating the Building Block, you can use the typed API and the local mock as is, and the Building Block grants IAM permissions automatically too. You write this on the &lt;code&gt;index.ts&lt;/code&gt; (backend definition) side.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// aws-blocks/index.ts — wrap an existing DynamoDB table with DistributedTable&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;todos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DistributedTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;todos&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;todoSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;userId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;todoId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DistributedTable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromExisting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-existing-todos-table&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// ← pass the existing table name&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IAM Is Automatic and Least-Privilege per Block
&lt;/h4&gt;

&lt;p&gt;When you instantiate a Building Block, the Lambda is automatically given &lt;strong&gt;permissions limited to that resource&lt;/strong&gt;. For example, with &lt;code&gt;KVStore&lt;/code&gt;, read/write permissions on the generated DynamoDB table (and its indexes) are granted scoped to that table's ARN (internally equivalent to CDK's &lt;code&gt;grantReadWriteData&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;That is, it doesn't take the broad approach of granting &lt;code&gt;dynamodb:*&lt;/code&gt; on all tables, and it can't touch other tables or other resources. The reason least privilege is applied without you hand-writing IAM policies is that the Building Block makes good use of CDK's &lt;code&gt;grant~&lt;/code&gt; behind the scenes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Constraints You Should Know
&lt;/h3&gt;

&lt;h4&gt;
  
  
  The Architecture Is a Lambda-lith (All Methods in a Single Lambda)
&lt;/h4&gt;

&lt;p&gt;When I actually ran &lt;code&gt;cdk synth&lt;/code&gt; and checked the contents, &lt;strong&gt;all the API methods coexisted in a single Lambda function&lt;/strong&gt; (memory 2048 MB / timeout 900 seconds). It's a configuration where API Gateway's proxy integration routes everything to a single Lambda (a so-called Lambda-lith).&lt;/p&gt;

&lt;p&gt;This comes with the following constraints.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can't separate scaling, permission isolation, or deployment units per method&lt;/li&gt;
&lt;li&gt;Bundle size and cold starts grow in proportion to the size of the API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As of now, it's not suited to use cases where you want to split functions in a microservices style. By contrast, for a prototype or a small-to-medium API, a single function works perfectly well, so this won't be a problem as long as you understand it as a design trade-off. That said, this configuration reflects how it's built at this point in time, and there seems to be room for it to change in future updates.&lt;/p&gt;

&lt;h4&gt;
  
  
  Side Effects of &lt;code&gt;new&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;new DistributedTable(...)&lt;/code&gt; and &lt;code&gt;new KVStore(...)&lt;/code&gt; are not just object creation; behind the scenes they are converted into CDK resource definitions. AWS Blocks is built on a philosophy that blurs the boundary between app and infrastructure even more than AWS CDK does, so those accustomed to recent, modern app development may feel some discomfort here.&lt;/p&gt;

&lt;p&gt;For example, suppose you &lt;code&gt;new&lt;/code&gt; a resource you access in your business logic within that logic, and then, in order to access the same resource from another endpoint, you also &lt;code&gt;new&lt;/code&gt; that resource definition in a different file. The same resource definition then ends up being &lt;code&gt;new&lt;/code&gt;-ed in the CDK code, and an Already Exists error occurs at the CDK layer. It's also worth keeping in the back of your mind that casually increasing instances as if they were ordinary classes unintentionally increases your infrastructure, and casually deleting them deletes the actual resources too.&lt;/p&gt;

&lt;p&gt;To avoid this, you need to declare the resource definition somewhere in a separate file (a class or a function) and have each piece of logic load it. With careful design and proper model separation, it's entirely possible to implement this without any awkwardness, but if you don't do it well, you end up with an implementation that has merely separated app and infrastructure, which can look like it goes against the philosophy of AWS Blocks.&lt;/p&gt;

&lt;p&gt;Conversely, if you have concerns about this area, I think you may be better off simply using AWS CDK.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;The first cases that come to mind are when you want to write code without being conscious of AWS or CDK, or when you don't have members who are very familiar with them. I also think it fits cases where the application you're building isn't very large—such as a prototype or internal tool you just want to launch quickly, or when you want to build something full-stack easily.&lt;/p&gt;

&lt;p&gt;As a CDK user too, I think it's a good fit, since—as mentioned earlier—you can also extend CDK in &lt;code&gt;index.cdk.ts&lt;/code&gt;. After all, CDK's flavor seeps through in how you write the code, with things like &lt;code&gt;scope&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt;, so knowing CDK may let you implement smoothly.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>awscdk</category>
      <category>cdk</category>
      <category>awsblocks</category>
    </item>
    <item>
      <title>AWS Blocks: Full-Stack Building Blocks That Run Locally Without an AWS Account</title>
      <dc:creator>Vivek V.</dc:creator>
      <pubDate>Tue, 16 Jun 2026 21:08:56 +0000</pubDate>
      <link>https://dev.to/aws-heroes/aws-blocks-full-stack-building-blocks-that-run-locally-without-an-aws-account-346l</link>
      <guid>https://dev.to/aws-heroes/aws-blocks-full-stack-building-blocks-that-run-locally-without-an-aws-account-346l</guid>
      <description>&lt;h2&gt;
  
  
  Why I built a Custom Kiro Power to ship faster with AWS Blocks
&lt;/h2&gt;

&lt;p&gt;Every developer building on AWS has hit the same wall. You need a sandbox account to test your idea. In most organizations that means a ticket, an approval workflow, a budget tag, and three days of waiting before you can find out if your DynamoDB schema even makes sense.&lt;/p&gt;

&lt;p&gt;Once you get access, the iteration cycle starts: write code, deploy, wait two minutes, request pull request approvals from Cloud Engineering/Platform leads and discover your IAM policy is wrong, fix it, deploy again, wait again. For a single table and an API endpoint, you might burn half a day just getting to "hello world" on real infrastructure.&lt;/p&gt;

&lt;p&gt;And when the sprint is over, somebody needs to tear it all down so the bill doesn't keep running.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;AWS Blocks&lt;/a&gt; eliminates this entire loop. You write your application using self-contained building blocks that run locally on your laptop with zero AWS credentials, then flip a single switch to deploy real infrastructure when the feature is validated. No sandbox requests. No deploy-wait-fix cycles during development. No environment-specific configuration. The code you test locally IS the code that runs on AWS.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AWS Blocks works
&lt;/h2&gt;

&lt;p&gt;You import a building block and use it directly in your application logic. There is no separate infrastructure definition layer. AWS calls this "Infrastructure from Code" — your backend entry point (&lt;code&gt;aws-blocks/index.ts&lt;/code&gt;) is both your runtime code and your infrastructure definition simultaneously.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;DistributedTable&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@aws-blocks/blocks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scope&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-app&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DistributedTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;taskId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;done&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;partitionKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;userId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;sortKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;taskId&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ApiNamespace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;addTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;taskId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;done&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;npm run dev&lt;/code&gt; and this works immediately. The table persists to &lt;code&gt;.bb-data/&lt;/code&gt; on disk (survives restarts). The API serves on localhost with hot reload. Your frontend imports the backend directly with full type safety — no codegen, no REST clients, no OpenAPI specs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-blocks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Write blog post&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// TypeScript knows the return type. Change the backend signature,&lt;/span&gt;
&lt;span class="c1"&gt;// and the frontend breaks at compile time. No contract drift.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a method to the API, and it shows up in frontend autocomplete instantly. Change a return type in the backend, and TypeScript breaks the frontend at compile time itself. No more developer excuses like "the API changed but nobody updated the client."&lt;/p&gt;

&lt;p&gt;When the feature is ready, run &lt;code&gt;npm run sandbox&lt;/code&gt; and the same code deploys as DynamoDB, API Gateway, and Lambda. You did not configure IAM policies, write CloudFormation, or think about capacity modes. The building block provisions itself.&lt;/p&gt;

&lt;p&gt;And when you outgrow what a Block provides? Every AWS Blocks app is a CDK app. Drop into &lt;code&gt;aws-blocks/index.cdk.ts&lt;/code&gt; and use any CDK construct directly. You're never stuck in an abstraction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes for teams
&lt;/h2&gt;

&lt;p&gt;A new developer clones the repository, runs &lt;code&gt;npm run dev&lt;/code&gt;, and has a working application in thirty seconds without ever needing an AWS account. They build and test features for days or weeks against local mocks that behave identically to production services. The CI pipeline runs integration tests against the same typed API imports — no browser required, no sandbox credentials to manage, no access keys to rotate.&lt;/p&gt;

&lt;p&gt;When a feature passes review, one deploy command creates real infrastructure. The team goes from "works on my machine" to "running on AWS" without changing a single line of application code, because there was never a separate infrastructure layer that could drift out of sync.&lt;/p&gt;

&lt;p&gt;For organizations where AWS environment access is controlled, gated, or simply slow to provision, this is the difference between developers waiting and developers shipping.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can build
&lt;/h2&gt;

&lt;p&gt;Twenty building blocks covering data, identity, communication, compute, storage, AI, and observability. Each runs locally with zero configuration and deploys to production AWS services:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;You write&lt;/th&gt;
&lt;th&gt;Runs locally as&lt;/th&gt;
&lt;th&gt;Deploys to AWS as&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new DistributedTable(scope, 'orders', { schema, key, indexes })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JSON file on disk&lt;/td&gt;
&lt;td&gt;DynamoDB with GSIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Database(scope, 'analytics', { migrationsPath })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PGlite (WASM Postgres)&lt;/td&gt;
&lt;td&gt;Aurora Serverless v2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new DistributedDatabase(scope, 'global', { migrationsPath })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PGlite (WASM Postgres)&lt;/td&gt;
&lt;td&gt;Aurora DSQL (zero idle cost)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new AuthCognito(scope, 'auth', { mfa, groups })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local JWT mock&lt;/td&gt;
&lt;td&gt;Cognito User Pool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Realtime(scope, 'collab', { namespaces })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local WebSocket server&lt;/td&gt;
&lt;td&gt;API Gateway WebSocket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new AsyncJob(scope, 'process', { schema, handler })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Runs in-process&lt;/td&gt;
&lt;td&gt;SQS + Lambda&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new CronJob(scope, 'digest', { schedule, handler })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Manual trigger&lt;/td&gt;
&lt;td&gt;EventBridge + Lambda&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new FileBucket(scope, 'uploads')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local filesystem&lt;/td&gt;
&lt;td&gt;S3 with presigned URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new KnowledgeBase(scope, 'docs', { source })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local vector search&lt;/td&gt;
&lt;td&gt;Bedrock Knowledge Bases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Agent(scope, 'assistant', { model, tools })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Canned responses&lt;/td&gt;
&lt;td&gt;Amazon Bedrock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new EmailClient(scope, 'notify', { fromAddress })&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Console log&lt;/td&gt;
&lt;td&gt;SES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Logger(scope, 'log')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Console output&lt;/td&gt;
&lt;td&gt;CloudWatch Logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Metrics(scope, 'metrics')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No-op&lt;/td&gt;
&lt;td&gt;CloudWatch (EMF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;new Tracer(scope, 'tracer')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No-op&lt;/td&gt;
&lt;td&gt;X-Ray&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Plus &lt;code&gt;KVStore&lt;/code&gt; for simple key-value, &lt;code&gt;AuthBasic&lt;/code&gt; for JWT auth, &lt;code&gt;AuthOIDC&lt;/code&gt; for social login, &lt;code&gt;AppSetting&lt;/code&gt; for config/secrets, &lt;code&gt;Dashboard&lt;/code&gt; for auto-generated observability views, and &lt;code&gt;Hosting&lt;/code&gt; for frontend deployment (CloudFront + S3 with SSR support).&lt;/p&gt;

&lt;p&gt;It works on every major web framework — Next.js, Nuxt, Astro, SolidStart, TanStack Start, React, Vue, Svelte, Angular — and generates typed native clients for iOS (Swift), Android (Kotlin), and Flutter (Dart) from the same backend definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teaching your AI agent a new framework
&lt;/h2&gt;

&lt;p&gt;AWS Blocks ships steering files inside the npm package that guide AI coding agents to build correct code. These work with any agent that reads &lt;code&gt;node_modules&lt;/code&gt; documentation.&lt;/p&gt;

&lt;p&gt;A Custom Kiro Power takes this further. It packages all twenty building block APIs, their constructor signatures, common pitfalls, deployment workflows, and scaffolding automation into a context bundle that activates dynamically when you mention "blocks" or "building blocks" in conversation. The agent doesn't need to dig through &lt;code&gt;node_modules&lt;/code&gt; — it already knows the patterns.&lt;/p&gt;

&lt;p&gt;Here is what that looks like in practice. You open Kiro, point it at an empty directory, and say:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Create a bookmark manager with AWS Blocks where users can save links, tag them, and search by tag or date added"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Power activates. The agent scaffolds the project with &lt;code&gt;--yes&lt;/code&gt; (non-interactive), defines a &lt;code&gt;DistributedTable&lt;/code&gt; with a schema for bookmarks, adds a &lt;code&gt;byTag&lt;/code&gt; index and a &lt;code&gt;byDate&lt;/code&gt; index, exports an &lt;code&gt;ApiNamespace&lt;/code&gt; with CRUD methods plus tag-based queries, wires up &lt;code&gt;AuthBasic&lt;/code&gt; for user isolation, and starts the dev server. Working app, running locally, in under a minute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Use building blocks to create a file sharing app with upload, download links, and automatic cleanup of expired files"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It uses &lt;code&gt;FileBucket&lt;/code&gt; for presigned upload/download URLs, &lt;code&gt;DistributedTable&lt;/code&gt; to track file metadata and expiry, &lt;code&gt;CronJob&lt;/code&gt; to sweep expired files daily, and &lt;code&gt;AuthOIDC&lt;/code&gt; with Google sign-in for user identity.&lt;/p&gt;

&lt;p&gt;Without the Power, the agent would need to find and read each block's README from &lt;code&gt;node_modules&lt;/code&gt;. With the Power, it builds complete applications from a single sentence because all the patterns, gotchas, and deployment knowledge are pre-loaded.&lt;/p&gt;

&lt;h2&gt;
  
  
  From local to production in two commands
&lt;/h2&gt;

&lt;p&gt;AWS Blocks gives you two deployment paths:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox&lt;/strong&gt; — fast, ephemeral, backend-only. Deploys in seconds using Lambda hot-swapping. Each developer gets an isolated environment. Use this during development to test against real AWS services.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run sandbox          &lt;span class="c"&gt;# deploy backend to AWS (seconds)&lt;/span&gt;
npm run sandbox:destroy  &lt;span class="c"&gt;# tear it down&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Production&lt;/strong&gt; — full CDK deployment via CloudFormation. Deploys your entire application including frontend hosting (CloudFront + S3), SSR if you use Next.js/Nuxt/Astro, custom domains, and WAF. Use this for staging and production environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run deploy   &lt;span class="c"&gt;# full production deployment&lt;/span&gt;
npm run destroy  &lt;span class="c"&gt;# tears down all deployed resources&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same code that ran on &lt;code&gt;localhost:3000&lt;/code&gt; is now running on AWS with DynamoDB tables, Lambda functions, API Gateway endpoints, and a CloudFront distribution. No infrastructure files were written. No IAM policies were hand-crafted. The building blocks provisioned themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Custom Kiro Power for AWS Blocks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kiro IDE:&lt;/strong&gt; Powers panel &amp;gt; Add Custom Power &amp;gt; Import from GitHub &amp;gt; &lt;code&gt;https://github.com/awsdataarchitect/aws-blocks&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The Custom Kiro Power is at &lt;a href="https://github.com/awsdataarchitect/aws-blocks" rel="noopener noreferrer"&gt;github.com/awsdataarchitect/aws-blocks&lt;/a&gt; for anyone who wants to go from plain English to a running AWS Blocks app without reading docs first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Or Try it Manually
&lt;/h2&gt;

&lt;p&gt;If you want to scaffold manually without a Power or any AI steering:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create @aws-blocks/blocks-app@latest my-app
&lt;span class="nb"&gt;cd &lt;/span&gt;my-app
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Working application in seconds, deployable to AWS with &lt;code&gt;npm run sandbox&lt;/code&gt; when you are ready. Full production deployment with hosting via &lt;code&gt;npm run deploy&lt;/code&gt;. Tear it down via &lt;code&gt;npm run destroy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/products/developer-tools/blocks/" rel="noopener noreferrer"&gt;AWS Blocks product page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/blocks/latest/devguide/what-is-blocks.html" rel="noopener noreferrer"&gt;Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/@aws-blocks/blocks" rel="noopener noreferrer"&gt;npm package&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-devtools-labs/aws-blocks" rel="noopener noreferrer"&gt;Source (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/awsdataarchitect/aws-blocks" rel="noopener noreferrer"&gt;Custom Kiro Power for AWS Blocks&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, what will you build with AWS Blocks?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>fullstack</category>
      <category>developer</category>
      <category>aws</category>
    </item>
    <item>
      <title>Adding Memory to the Agent</title>
      <dc:creator>Matt Lewis</dc:creator>
      <pubDate>Tue, 16 Jun 2026 08:54:17 +0000</pubDate>
      <link>https://dev.to/aws-heroes/adding-memory-to-the-agent-181k</link>
      <guid>https://dev.to/aws-heroes/adding-memory-to-the-agent-181k</guid>
      <description>&lt;p&gt;This is the fourth in a series of posts documenting the architecture, implementation, and lessons learned from building the AWS Briefing Agent - a personalised AWS assistant deployed on &lt;code&gt;Amazon Bedrock AgentCore Runtime&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://dev.to/aws-heroes/building-a-full-stack-ai-agent-on-amazon-bedrock-agentcore-2p"&gt;Building a Full-Stack AI Agent on Bedrock AgentCore&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://dev.to/aws-heroes/data-ingestion-rss-feeds-knowledge-base-s3-vectors-and-metadata-filtering-4n8m"&gt;Data Ingestion: RSS Feeds, Knowledge Base, S3 Vectors, and Metadata Filtering&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://dev.to/aws-heroes/strands-agents-agentcore-runtime-a-perfect-match-3a51"&gt;Strands Agents + AgentCore Runtime - a perfect match&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 4: Adding Memory to the Agent&lt;/li&gt;
&lt;li&gt;Part 5: Experimenting with API Gateway&lt;/li&gt;
&lt;li&gt;Part 6: Observability and Evaluations&lt;/li&gt;
&lt;li&gt;Part 7: Third Party Integrations - Identity, Gateway and Slack Notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As mentioned in the first blog post, each session on AgentCore Runtime is assigned a dedicated Firecracker microVM with isolated CPU, memory and filesystem resources. When the session finishes, the entire microVM is destroyed. There is no shared state between sessions, which prevents any cross-session data leakage.&lt;/p&gt;

&lt;p&gt;When a user accesses our AWS Briefing Agent service for the first time, they are asked a number of questions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7f1sa6b4wiszghilqp8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7f1sa6b4wiszghilqp8g.png" alt="Briefing Agent Home Page" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This includes asking about the primary AWS services the user is interested in, their experience level in AWS, and if there are specific AWS areas they want to track closely.&lt;/p&gt;

&lt;p&gt;Without any memory capability, the user will have to provide the same information each time they start a new session. This is where AgentCore Memory comes into play. This post walks through setting up AgentCore Memory using Strands Agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Memory in AgentCore
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;agentcore.json&lt;/code&gt; file is the primary configuration file used in Amazon Bedrock AgentCore to define and manage AI agents, gateways, memory stores and datasets. It acts as the central orchestrator to package up the agents infrastructure.&lt;/p&gt;

&lt;p&gt;When we run the &lt;code&gt;agentcore deploy&lt;/code&gt; command, the CLI reads this file and uses the AWS CDK to synthesize and deploy CloudFormation resources. We add long term to our agent in the memory section using a resource identifier of "BriefingAgentMemory". This is the identifier that is referenced in our handler. &lt;/p&gt;

&lt;p&gt;AgentCore Memory itself consists of several key components that work together to provide both short-term context and long-term intelligence for agents as shown in the diagram below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyq30ex3awbyk7i2t4s52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyq30ex3awbyk7i2t4s52.png" alt="AgentCore Memory Overview" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The interactions with the user are stored in short term for 90 days, as specified in the event expiry duration attribute. We then specify two distinct memory strategies that transform these short term raw events into long-term memory. Note that all strategies by default ignore personally identifiable information (PII) data from long-term memory records.&lt;/p&gt;

&lt;p&gt;We define the following strategies in the &lt;code&gt;agentcore.json&lt;/code&gt; file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SEMANTIC&lt;/strong&gt; - this memory strategy identifies and extracts key pieces of factual information and contextual knowledge from conversational data. For example, a user is running AWS Lambda in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;USER_PREFERENCE&lt;/strong&gt; - this memory strategy is designed to automatically identify and extract user preferences, choices and styles from conversations. For example, a user is interested in serverless and containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each strategy stores its long-term memory in a hierarchical structure within a namespace. These namespaces act as distinct logical containers. We segregate them using the special &lt;code&gt;{actorId}&lt;/code&gt; placeholder variable, so that we guarantee separation between each user.&lt;/p&gt;

&lt;p&gt;The complete relevant memory section in our &lt;code&gt;agentcore.json&lt;/code&gt; file is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"memories"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BriefingAgentMemory"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"eventExpiryDuration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"strategies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SEMANTIC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"semantic_facts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"namespaces"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"/users/{actorId}/facts"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_PREFERENCE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_preferences"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"namespaces"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"/users/{actorId}/preferences"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Integrating Cognito and AgentCore Runtime
&lt;/h2&gt;

&lt;p&gt;At this point, we need to do a segway into how we authenticate requests to our agent. AgentCore Runtime supports two inbound authentication mechanisms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AWS IAM SigV4 - where the request to the &lt;code&gt;InvokeAgentRuntime&lt;/code&gt; API is SigV4-signed with valid AWS credentials that have the &lt;code&gt;bedrock-agentcore:InvokeAgentRuntime&lt;/code&gt; IAM permission.&lt;/li&gt;
&lt;li&gt;JWT Bearer Token Auth - which is configured with an Inbound JWT authoriser&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When our frontend invokes the agent, it is sending a request to the agent's public endpoint URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://bedrock-agentcore.eu-west-1.amazonaws.com/runtimes/&amp;lt;arn&amp;gt;/invocations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This URL is a special public-facing endpoint that AgentCore Runtime exposes. We specify this in the &lt;code&gt;agentcore.json&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"runtimes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AWSBriefingAgent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"build"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Container"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"entrypoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"handler.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authorizerType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CUSTOM_JWT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authorizerConfiguration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"customJwtAuthorizer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"discoveryUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://cognito-idp.eu-west-1.amazonaws.com/eu-west-1_dshjdhskj/.well-known/openid-configuration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"allowedClients"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"dhjhdjskhdjkshdjkhsd"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;discoveryUrl&lt;/code&gt; points to Cognito's OpenID Connect discovery document for the AWS Cognito User Pool with the specified ID that is being used to authenticate users to the frontend. When AgentCore Runtime wants to validate the JWT token, it retrieves information from this endpoint such as the issuer and JWKS endpoint (contains the public keys used to verify the JWT signature).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;allowedClients&lt;/code&gt; shows the Cognito Application Client ID. When a user logs in, Cognito stamps the token with the client_id. AgentCore validates the JWT’s client_id claim, so only tokens issued for one of the permitted application clients can invoke the runtime.&lt;/p&gt;

&lt;p&gt;When the user logs into our frontend application with their email address and password, the frontend calls Cognito directly to verify, and receives back&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Access token&lt;/strong&gt; — proves who you are and what you're allowed to do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ID token&lt;/strong&gt; — contains profile info (email, name). Used by the frontend to display the username.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refresh token&lt;/strong&gt; — used to get new access/ID tokens when they expire (usually after 1 hour).
These tokens are stored by the frontend auth library.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When we send a request to the agent, the frontend attaches the access token as a bearer token&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;POST /invocations
Authorization: Bearer eyJraWQi...
Body: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;: &lt;span class="s2"&gt;"Give me a briefing"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the JWT token that gets validated by AgentCore Runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Returning Memory records in Handler function
&lt;/h2&gt;

&lt;p&gt;The following code snippet shows how we retrieve the memory records to display in the sidebar of the frontend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Derive actor_id from the JWT 'sub' claim (source of truth)
&lt;/span&gt;    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_extract_sub_from_jwt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Sanitize actor_id for AgentCore Memory
&lt;/span&gt;    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[^a-zA-Z0-9\-_/]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve memory records to include in the stream
&lt;/span&gt;    &lt;span class="n"&gt;memory_used&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_memory_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;@app.entrypoint&lt;/code&gt; decorator registers a function as the handler for POST requests to &lt;code&gt;/invocations&lt;/code&gt;. AgentCore Runtime calls this handler function when a client invokes the agent. Our handler function is an async generator, which means that it automatically streams the response as Server-Sent Events (SSE) delivered to the client in real-time (more around this in the next blog post).&lt;/p&gt;

&lt;p&gt;Within the handler, we get the message that has been sent in the payload. We then extract the user's identity from the JWT token that Cognito issued. One of the claims in the JWT token is the &lt;code&gt;sub&lt;/code&gt; or subject, which is the unique user ID assigned by Cognito to a user when they first register. We know that the JWT token has been cryptographically signed by Cognito and validated by AgentCore Runtime before it reaches the handler function. We assign this &lt;code&gt;sub&lt;/code&gt; value to be the &lt;code&gt;actor_id&lt;/code&gt;. We apply some regex to the actual value to ensure it has no characters in it that are not supported.&lt;/p&gt;

&lt;p&gt;We then call our &lt;code&gt;get_memory_records&lt;/code&gt; function. This function calls the AgentCore retrieve memory records API to search the long-term memory for facts and preferences relevant to the promt that has just been passed in. We retrieve the 5 highest scoring results from the vector search and store them in a records array, which is streamed back to the frontend to be displayed in the sidebar.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_memory_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve long-term memory records relevant to the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s prompt.

    Searches both the facts and preferences namespaces and returns
    the records the agent would have seen for this invocation.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;MEMORY_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-agentcore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/facts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_memory_records&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;memoryId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;searchCriteria&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="n"&gt;maxResults&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryRecordSummaries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
                    &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryRecordId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryRecordId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryStrategyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memoryStrategyId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespaces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;namespaces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
                    &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to retrieve from %s: %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to retrieve memory records: %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can see an example of the sidebar in the frontend below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvhx2d66umg4wr0rk3yn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvhx2d66umg4wr0rk3yn.png" alt="Memory Sidebar" width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &amp;nbsp;Setting up Memory with Strands
&lt;/h2&gt;

&lt;p&gt;Both short-term and long-term memory are handled for us automatically through the AgentCore Memory session manager integration for Strands.&lt;/p&gt;

&lt;p&gt;The memory ID is retrieved in a module-level constant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MEMORY_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEMORY_BRIEFINGAGENTMEMORY_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reads the memory resource ID that AgentCore Runtime automatically injects as an environment variable into your container at runtime. The naming convention is: MEMORY__ID. Given the memory was given a name of "BriefingAgentMemory" in the &lt;code&gt;agentcore.json&lt;/code&gt; file, AgentCore sets  MEMORY_BRIEFINGAGENTMEMORY_ID to the actual memory resource ID (something like AWSBriefingAgent_BriefingAgentMemory-q2iBfL64BS).&lt;/p&gt;

&lt;p&gt;The following function in our code is called on every request. A new stateless Strands Agent instance is created on each invocation, configured with the relevant session manager that loads conversation history from AgentCore Memory, tools and model settings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gateway_tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a Strands Agent with KB retrieval, AgentCore Memory, and Gateway tools.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;session_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;MEMORY_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.memory.integrations.strands.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;AgentCoreMemoryConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;RetrievalConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.memory.integrations.strands.session_manager&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;AgentCoreMemorySessionManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentCoreMemoryConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MEMORY_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;retrieval_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/facts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RetrievalConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevance_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/preferences&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RetrievalConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relevance_score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;session_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentCoreMemorySessionManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;agentcore_memory_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;REGION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to initialise memory session manager: %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;format_slack_message&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gateway_tools&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_load_system_prompt&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_create_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SlidingWindowConversationManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;window_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;should_truncate_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;per_turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;callback_handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In our code, if memory has been set, then we import the &lt;code&gt;AgentCoreMemorySessionManager&lt;/code&gt;. This session manager integrates Strands agents with AgentCore Memory, which synchronises the short-term and long-term memory capabilities. Some of its features include loading the conversation history from short-term memory during agent initialisation, and integrating with long-term memory for context injection into agent state.&lt;/p&gt;

&lt;p&gt;Next we create a &lt;code&gt;AgentCoreMemoryConfig&lt;/code&gt; configuration object which will be passed to the session manager telling it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory_id - which AgentCore Memory resource to connect to&lt;/li&gt;
&lt;li&gt;session_id - the identifier for the conversation session&lt;/li&gt;
&lt;li&gt;actor_id - the unique identifier for the user&lt;/li&gt;
&lt;li&gt;retrieval_config - a dictionary mapping of namespaces to retrieval configurations. This tells the session manage to search the two namespaces for relevant long-term memories, and to get the 5 most relevant facts and user preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our use of AgentCore Memory is now handled automatically by Strands Agents session manager. Before each turn, it will load recent events from the same session to populate the agent's conversation context. The short-term memory is the raw event stream. The agent will see the last 20 turns in its context window, as this has been configured with the Sliding Window Conversation Manager. After (and during) invocations of the agent, new conversation messages are automatically persisted to AgentCore Memory.&lt;/p&gt;

&lt;p&gt;With this in place, we have now successfully added long-term memory to our agent, personalising the briefing for each user based on their preferences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Biography
&lt;/h2&gt;

&lt;p&gt;As Chief AWS Architect at IBM in the UK, I am responsible for growing the AWS capability and community within one of the fastest growing AWS consulting partners globally. This gives me the opportunity to try out the latest features in preview before they go into general availability. You'll often find me blogging about my experience, but please reach out if there are services you'd like to know more about.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>aws</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Serverless applications on AWS with Lambda using Java 25, API Gateway and Aurora DSQL - Lambda performance optimization approaches</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Mon, 15 Jun 2026 14:51:59 +0000</pubDate>
      <link>https://dev.to/aws-heroes/serverless-applications-on-aws-with-lambda-using-java-25-api-gateway-and-aurora-dsql-lambda-4hbj</link>
      <guid>https://dev.to/aws-heroes/serverless-applications-on-aws-with-lambda-using-java-25-api-gateway-and-aurora-dsql-lambda-4hbj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the previous articles of the series about how to develop, run, and optimize Serverless applications on AWS with Lambda using Java 25, API Gateway, and Aurora DSQL database, we used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managed Java 25 runtime&lt;/li&gt;
&lt;li&gt;GraalVM Native Image deployed as Lambda Custom Runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also did Lambda performance (cold and warm starts) measurements with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda functions used 1024 MB of memory&lt;/li&gt;
&lt;li&gt;Java compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1" &lt;/li&gt;
&lt;li&gt;Lambda x86_64 architecture used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we'll introduce some additional Lambda performance (cold and warm starts) optimization approaches to apply to our sample application. You'll need to measure the performance by yourself to figure out whether they will provide the desired Lambda performance improvements.&lt;/p&gt;

&lt;p&gt;Please keep in mind that you can also deploy our sample application on AWS Lambda as a (Docker) Container Image. I didn't cover this approach, but you can look into my article series &lt;a href="https://dev.to/vkazulkin/series/34789"&gt;Lambda function using Docker Container Image&lt;/a&gt; for a step-by-step introduction on how to do it. I used DynamoDB as a database in this example. The cold start will be quite big. Lambda SnapStart isn't available for the Lambda deployment as a Container Image. Instead, you can use &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/java-customization.html#aot-cds-caches" rel="noopener noreferrer"&gt;Ahead-of-Time (AOT) and CDS caches&lt;/a&gt; for the Container Image and then measure the Lambda performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lambda performance optimization approaches
&lt;/h2&gt;

&lt;p&gt;To find a good balance between the cold and warm start times of the Lambda function, you can try out the optimization techniques introduced below. I have not taken any additional measurements with our sample application with Java and GraalVM 25, but have done so using older Java, GraalVM, and dependency versions. &lt;/p&gt;

&lt;p&gt;We can apply the following approaches to the managed Java runtime and GraalVM Native Image. For the managed Java runtime, it includes SnapStart being enabled and applying the priming techniques on top:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out different Lambda memory settings. We performed all measurements with 1024 MB of memory for the Lambda function. With different memory settings, you might become better at the price-performance trade-off. &lt;/li&gt;
&lt;li&gt;Try out setting Lambda arm64 architecture using AWS Graviton2 processor, which supports SnapStart since July 2024. This can provide a better cost-performance trade-off compared to x86 architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approaches primarily only to the managed Java runtime on Lambda. This includes SnapStart being enabled and applying the priming techniques on top:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out different Java compilation options for the Lambda function. We performed all measurements until now with the compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1". We can provide other compilation options to the Lambda function using an environment variable called &lt;em&gt;JAVA_TOOL_OPTIONS&lt;/em&gt;. This can have different cold and warm starts trade-offs. For GraalVM Native Image, the choice of Java compilation method doesn't have much impact on the Lambda performance. This is because our application is already compiled natively.&lt;/li&gt;
&lt;li&gt;Further exclude unused dependencies. With that, we can especially reduce the cold start times (also for SnapStart enabled). In the case of GraalVM Native Image, only reachable Java classes, functions, and methods will become a part of the Native Image, so including unused dependencies may not help that much.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approach primarily to the managed Java runtime on Lambda with&amp;nbsp;SnapStart enabled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for further Lambda SnapStart priming potential in addition to those we introduced in this series. For this, you can use AWS Lambda Profiler Extension for Java. I described it in my article &lt;a href="https://dev.to/aws-heroes/aws-lambda-profiler-extension-for-java-part-2-improving-lambda-performance-with-lambda-snapstart-4p06"&gt;Improving Lambda performance with Lambda SnapStart and priming&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approach primarily to the GraalVM Native Image :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out &lt;a href="https://www.graalvm.org/21.3/reference-manual/native-image/PGO/" rel="noopener noreferrer"&gt;Profile-Guided Optimizations&lt;/a&gt; to see whether you can further improve Lambda performance. The difficulty of trying out this technique is that you'll need to do some additional semi-automated steps to run your application either with the Lambda emulator locally or in an extra environment to obtain the profile of your application, which you'll then need to use to generate the optimized Native Image. You can use &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-extensions.html" rel="noopener noreferrer"&gt;Lambda extension&lt;/a&gt; for it, but it still requires a lot of additional work. This is the work AWS did for us in case Lambda SnapStart is enabled. I really appreciate that I don't need to care about generating, encrypting, storing, and restoring the snapshots/profiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we introduced additional Lambda performance optimization approaches that we can use in our sample application. Try them out on your own to figure out whether they will provide the desired Lambda performance improvements. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also watch out for another &lt;a href="https://dev.to/vkazulkin/series/36298"&gt;series&lt;/a&gt; where I use a NoSQL serverless &lt;a href="https://aws.amazon.com/dynamodb/" rel="noopener noreferrer"&gt;Amazon DynamoDB&lt;/a&gt; database instead of Aurora DSQL to do the same Lambda performance measurements.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like my content, please follow me on &lt;a href="https://github.com/Vadym79" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give my repositories a star!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also check out my &lt;a href="https://vkazulkin.com" rel="noopener noreferrer"&gt;website&lt;/a&gt; for more technical content and upcoming public speaking activities.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>serverless</category>
      <category>awslambda</category>
    </item>
    <item>
      <title>Building Real-Time Fraud Detection with Amazon Bedrock and Claude AI - (Let's Build 🏗️ Series)</title>
      <dc:creator>awedis</dc:creator>
      <pubDate>Fri, 12 Jun 2026 19:47:16 +0000</pubDate>
      <link>https://dev.to/aws-heroes/building-real-time-fraud-detection-with-amazon-bedrock-and-claude-ai-lets-build-series-1meg</link>
      <guid>https://dev.to/aws-heroes/building-real-time-fraud-detection-with-amazon-bedrock-and-claude-ai-lets-build-series-1meg</guid>
      <description>&lt;p&gt;Imagine a fintech or e-commerce platform where you want to detect suspicious transactions in real time. Instead of writing complex rules only, you use AI + event-driven architecture.&lt;/p&gt;

&lt;p&gt;In this article, I will design an architecture that detects fraudulent traffic and malicious activity in a system using AI and Amazon Bedrock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The main parts of this article:&lt;/strong&gt;&lt;br&gt;
1- Architecture&lt;br&gt;
2- Flow Step-by-Step, AWS Bedrock (Claude)&lt;br&gt;
3- Key Takeaways&lt;/p&gt;


&lt;h2&gt;
  
  
  1- Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9hlaadi3gpwafsk7j14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9hlaadi3gpwafsk7j14.png" alt=" " width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used Amazon EventBridge as an example here, but you might be using Amazon API Gateway or even AWS Step Functions in your architecture.&lt;/p&gt;


&lt;h2&gt;
  
  
  2- Flow Step-by-Step
&lt;/h2&gt;
&lt;h4&gt;
  
  
  A. Transaction Happens
&lt;/h4&gt;

&lt;p&gt;A user makes a payment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"country"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"unknown VPN location"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"device"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"new device"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"03:12 AM"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Event is sent to EventBridge.&lt;/p&gt;

&lt;h4&gt;
  
  
  B. Lambda Trigger
&lt;/h4&gt;

&lt;p&gt;Lambda receives the transaction and prepares a prompt.&lt;/p&gt;

&lt;h4&gt;
  
  
  C. Amazon Bedrock (Claude)
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a fraud detection system.

    Analyze the transaction and return:
    - risk_score (0-100)
    - decision (ALLOW / REVIEW / BLOCK)
    - reason

    Transaction:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tx&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eu.anthropic.claude-haiku-4-5-20251001-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  D. AI Output Example
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"# Fraud Detection Analysis&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**risk_score:** 78&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**decision:** REVIEW&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**reason:** Multiple risk factors detected:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Unknown VPN location** - Unable to verify legitimate geographic origin; suggests attempted anonymization&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **New device** - First transaction from unrecognized device; increases fraud probability&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Unusual transaction time** (03:12 AM) - Outside typical user activity windows&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Moderate-high amount** ($2,500) - Significant transaction value amplifies risk&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;**Recommendation:** Require additional verification (2FA, identity confirmation, or customer contact) before processing."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  E. Action Taken
&lt;/h4&gt;

&lt;p&gt;Depending on response:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;BLOCK&lt;/code&gt; → reject transaction&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;REVIEW&lt;/code&gt; → send to manual review queue&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ALLOW&lt;/code&gt; → proceed normally&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3- Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Together, these capabilities make AI-powered event-driven systems far more powerful than traditional rule-based approaches, as they can understand context instead of relying on static thresholds, make real-time decisions as events occur without batch delays, scale effortlessly from a few transactions to millions thanks to serverless architectures, and continuously adapt to new fraud patterns or behaviors. This combination enables systems that are not only scalable and efficient, but also intelligent, dynamic, and resilient to changing environments.&lt;/p&gt;

&lt;p&gt;Happy coding 👨🏻‍💻&lt;/p&gt;

&lt;p&gt;💡 Enjoyed this? Let’s connect and geek out some more on &lt;a href="https://www.linkedin.com/in/awedis" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>claude</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Serverless applications on AWS with Lambda using Java 25, API Gateway and DynamoDB - Part 7 Lambda performance optimization approaches</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:04:06 +0000</pubDate>
      <link>https://dev.to/aws-heroes/serverless-applications-on-aws-with-lambda-using-java-25-api-gateway-and-dynamodb-part-7-lambda-4po1</link>
      <guid>https://dev.to/aws-heroes/serverless-applications-on-aws-with-lambda-using-java-25-api-gateway-and-dynamodb-part-7-lambda-4po1</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the previous articles of the series about how to develop, run, and optimize Serverless applications on AWS with Lambda using Java 25, API Gateway, and DynamoDB, we used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managed Java 25 runtime&lt;/li&gt;
&lt;li&gt;GraalVM Native Image deployed as Lambda Custom Runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also did Lambda performance (cold and warm starts) measurements with the following settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda functions used 1024 MB of memory&lt;/li&gt;
&lt;li&gt;Java compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1" &lt;/li&gt;
&lt;li&gt;Lambda x86_64 architecture used&lt;/li&gt;
&lt;li&gt;Default Apache HTTP Client (version 4.5) used to connect to the DynamoDB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article, we'll introduce some additional Lambda performance (cold and warm starts) optimization approaches to apply to our sample application. You'll need to measure the performance by yourself to figure out whether they will provide the desired Lambda performance improvements.&lt;/p&gt;

&lt;p&gt;Please keep in mind that you can also deploy our sample application on AWS Lambda as a (Docker) Container Image. I didn't cover this approach, but you can look into my article series &lt;a href="https://dev.to/vkazulkin/series/34789"&gt;Lambda function using Docker Container Image&lt;/a&gt; for a step-by-step introduction on how to do it. The cold start will be quite big. Lambda SnapStart isn't available for the Lambda deployment as a Container Image. Instead, you can use &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/java-customization.html#aot-cds-caches" rel="noopener noreferrer"&gt;Ahead-of-Time (AOT) and CDS caches&lt;/a&gt; for the Container Image and then measure the Lambda performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lambda performance optimization approaches
&lt;/h2&gt;

&lt;p&gt;To find a good balance between the cold and warm start times of the Lambda function, you can try out the optimization techniques introduced below. I have not taken any additional measurements with our sample application with Java and GraalVM 25, but have done so using older Java, GraalVM, and dependency versions. I'll provide references to my relevant articles. Measurements that I did back then might already be outdated, so I strongly recommend you to re-measure.&lt;/p&gt;

&lt;p&gt;We can apply the following approaches to the managed Java runtime and GraalVM Native Image. For the managed Java runtime, it includes SnapStart being enabled and applying the priming techniques on top:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out different Lambda memory settings. We performed all measurements with 1024 MB of memory for the Lambda function. With different memory settings, you might become better at the price-performance trade-off. See my article &lt;a href="https://dev.to/aws-builders/aws-snapstart-part-14-measuring-cold-and-warm-starts-with-java-21-using-different-compilation-options-el4"&gt;Measuring cold and warm starts and deployment time with Java 21 using different Lambda memory settings&lt;/a&gt; for further examples, performance measurements, and conclusions.&lt;/li&gt;
&lt;li&gt; Try out setting Lambda arm64 architecture using the AWS Graviton2 processor, which supports SnapStart since July 2024. This can provide a better cost-performance trade-off compared to x86 architecture. See my article &lt;a href="https://dev.to/aws-builders/aws-lambda-performance-with-java-21-x86-vs-arm64-part-1-initial-measurements-506"&gt;AWS Lambda performance with Java 21: x86 vs arm64 - Initial measurements&lt;/a&gt; for some insights.&lt;/li&gt;
&lt;li&gt;Try out different synchronous HTTP clients to establish an HTTP connection to DynamoDB. We performed all measurements until now with the default synchronous Apache HTTP Client version 4.5. There are other options like UrlConnection and AWS CRT HTTP clients, which provide different performance trade-offs for the cold and warm start. See my article &lt;a href="https://dev.to/aws-builders/aws-snapstart-part-15-measuring-cold-and-warm-starts-with-java-21-using-different-synchronous-http-clients-579o"&gt;Measuring cold and warm starts with Java 21 using different synchronous HTTP clients&lt;/a&gt; for further examples, performance measurements, and conclusions. GraalVM Native Image also supports the AWS CRT HTTP Client, and I did some measurements using a pure Java Lambda function in my article &lt;a href="https://dev.to/aws-heroes/lambda-function-with-graalvm-native-image-part-6-measuring-cold-and-warm-starts-with-graalvm-23-4d3a"&gt;Measuring cold and warm starts with GraalVM 23 and AWS CRT HTTP Client&lt;/a&gt;. Recently, also &lt;a href="https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/http-configuration-apache5.html" rel="noopener noreferrer"&gt;Apache 5.x based HTTP client&lt;/a&gt; has been released, so you can try it out.&lt;/li&gt;
&lt;li&gt;Explore whether an asynchronous HTTP client for DynamoDB is an option for your use case. The default asynchronous HTTP Client is NettyNio. There is another option, the AWS CRT async HTTP client, which provides different performance trade-offs for the cold and warm starts. See my article &lt;a href="https://dev.to/aws-builders/aws-snapstart-part-16-measuring-cold-and-warm-starts-with-java-21-using-different-asynchronous-http-clients-4n2"&gt;Measuring cold and warm starts with Java 21 using different asynchronous HTTP clients&lt;/a&gt; for further examples, performance measurements, and conclusions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approaches primarily only to the managed Java runtime on Lambda. This includes SnapStart being enabled and applying the priming techniques on top:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out different Java compilation options for the Lambda function. We performed all measurements until now with the compilation option "-XX:+TieredCompilation -XX:TieredStopAtLevel=1". We can provide other compilation options to the Lambda function using an environment variable called &lt;em&gt;JAVA_TOOL_OPTIONS&lt;/em&gt;. This can have different cold and warm starts trade-offs. See my article &lt;a href="https://dev.to/aws-builders/aws-snapstart-part-14-measuring-cold-and-warm-starts-with-java-21-using-different-compilation-options-el4"&gt;Measuring cold and warm starts with Java 21 using different compilation options&lt;/a&gt; for further examples, performance measurements, and conclusions. For GraalVM Native Image, the choice of Java compilation method doesn't have much impact on the Lambda performance. This is because our application is already compiled natively.&lt;/li&gt;
&lt;li&gt;Further exclude unused dependencies. With that, we can especially reduce the cold start times (also for SnapStart enabled); see my article &lt;a href="https://dev.to/aws-builders/aws-snapstart-part-11-measuring-cold-starts-with-java-21-using-different-deployment-artifact-sizes-4g29"&gt;Measuring cold starts with Java 21 using different deployment artifact sizes&lt;/a&gt;. In the case of GraalVM Native Image, only reachable Java classes, functions, and methods will become a part of the Native Image, so including unused dependencies may not help that much.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approach primarily to the managed Java runtime on Lambda with&amp;nbsp;SnapStart enabled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search for further Lambda SnapStart priming potential in addition to those we introduced in this series. For this, you can use AWS Lambda Profiler Extension for Java. I described it in my article &lt;a href="https://dev.to/aws-heroes/aws-lambda-profiler-extension-for-java-part-2-improving-lambda-performance-with-lambda-snapstart-4p06"&gt;Improving Lambda performance with Lambda SnapStart and priming&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can apply the following approach primarily to the GraalVM Native Image :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try out &lt;a href="https://www.graalvm.org/21.3/reference-manual/native-image/PGO/" rel="noopener noreferrer"&gt;Profile-Guided Optimizations&lt;/a&gt; to see whether you can further improve Lambda performance. The difficulty of trying out this technique is that you'll need to do some additional semi-automated steps to run your application either with the Lambda emulator locally or in an extra environment to obtain the profile of your application, which you'll then need to use to generate the optimized Native Image. You can use &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-extensions.html" rel="noopener noreferrer"&gt;Lambda extension&lt;/a&gt; for it, but it still requires a lot of additional work. This is the work AWS did for us in case Lambda SnapStart is enabled. I really appreciate that I don't need to care about generating, encrypting, storing, and restoring the snapshots/profiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we introduced additional Lambda performance optimization approaches that we can use in our sample application. Try them out on your own to figure out whether they will provide the desired Lambda performance improvements. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also watch out for another &lt;a href="https://dev.to/vkazulkin/series/36919"&gt;series&lt;/a&gt; where I use a relational serverless &lt;a href="https://aws.amazon.com/rds/aurora/dsql/" rel="noopener noreferrer"&gt;Amazon Aurora DSQL&lt;/a&gt; database and additionally the &lt;a href="https://hibernate.org/" rel="noopener noreferrer"&gt;Hibernate ORM framework&lt;/a&gt; instead of DynamoDB to do the same Lambda performance measurements.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you like my content, please follow me on &lt;a href="https://github.com/Vadym79" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and give my repositories a star!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Please also check out my &lt;a href="https://vkazulkin.com" rel="noopener noreferrer"&gt;website&lt;/a&gt; for more technical content and upcoming public speaking activities.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>java</category>
      <category>serverless</category>
      <category>awslambda</category>
    </item>
    <item>
      <title>Security for MCP Servers: Governed Access Beats Uploading Spreadsheets to ChatGPT</title>
      <dc:creator>Guy</dc:creator>
      <pubDate>Sun, 07 Jun 2026 14:54:28 +0000</pubDate>
      <link>https://dev.to/aws-heroes/security-for-mcp-servers-governed-access-beats-uploading-spreadsheets-to-chatgpt-3ag7</link>
      <guid>https://dev.to/aws-heroes/security-for-mcp-servers-governed-access-beats-uploading-spreadsheets-to-chatgpt-3ag7</guid>
      <description>&lt;p&gt;An analyst exports a spreadsheet with customer data, uploads it into a generic AI chat, asks for a sales summary, and gets an answer in seconds. It feels productive. It is also one of the least governed ways to bring AI into an enterprise. The model sees the raw file. The organization has little control over which fields were exposed, whether PII was included, whether the same data should have been aggregated first, or whether the user should have been allowed to access every row in the first place.&lt;/p&gt;

&lt;p&gt;There is another failure mode that matters just as much in enterprise settings: inconsistent business aggregation. I saw this directly with one company where two managers prepared reports for the same management meeting from the same underlying report, but got dramatically different answers from ChatGPT. One manager instructed it to use the company's fiscal year, which starts in July; the other did not. One mentioned Canada as part of North America; the other forgot to ask for it. Both reports looked polished; both were based on the same raw data, but because the business definitions were stored in ad hoc prompts and sent to a probabilistic LLM rather than a governed interface, the meeting started with conflicting numbers.&lt;/p&gt;

&lt;p&gt;That is the wrong operating model for enterprise AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iv67uh9s13zooq8qijs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1iv67uh9s13zooq8qijs.png" alt="Security of AI platforms" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The safer pattern is to connect the AI client to an MCP server instead. The MCP server becomes the controlled interface between the model and the underlying data systems. It determines which tools exist, what inputs they accept, what data shape they return, and which authenticated user they act on behalf of. The model does not get the whole spreadsheet. It gets the smallest, governed slice of data needed to answer the question.&lt;/p&gt;

&lt;p&gt;A well-designed MCP server prevents this kind of semantic drift. The business analyst can define that fiscal years start in July, that North America includes Canada, that revenue is recognized by a specific accounting rule, and that management reports return a standard aggregation. Every user and every model then works from the same, governed business definition, rather than relitigating those choices in every chat.&lt;/p&gt;

&lt;p&gt;This is where the recurring motif from the previous articles comes into play again. Good MCP design is always a balance of strengths across multiple parties. Security is not the job of one component. The &lt;strong&gt;business analyst&lt;/strong&gt; defines the safe business surface. The &lt;strong&gt;business user&lt;/strong&gt; brings the runtime question. The &lt;strong&gt;LLM&lt;/strong&gt; interprets intent. The &lt;strong&gt;MCP server&lt;/strong&gt; validates and executes deterministically. And when the surface becomes more powerful, as we saw in the &lt;a href="https://dev.to/aws-heroes/code-mode-for-mcp-the-long-tail-escape-hatch-not-the-front-door-40ga"&gt;code mode article&lt;/a&gt;, the &lt;strong&gt;IT administrator&lt;/strong&gt; governs policy, approval, and audit. Each party covers the other party's weaknesses.&lt;/p&gt;

&lt;p&gt;This article shows how that balance becomes a security architecture for production MCP servers, with layers of security that collectively enhance the overall trust in AI interactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is MCP? (The 30-Second Version)
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP, &lt;a href="https://modelcontextprotocol.io/specification/2025-11-25" rel="noopener noreferrer"&gt;spec 2025-11-25&lt;/a&gt;) is the interface layer between AI clients and external systems through tools, prompts, and resources. In enterprise deployments, the MCP server should be a thin, remote, mostly stateless interface layer over internal data systems. If HTTP-based applications are the human-facing interface to enterprise systems, MCP servers are the AI-facing interface.&lt;/p&gt;

&lt;p&gt;That framing matters for security. Once you see the MCP server as an interface tier, the right controls become obvious: authenticated requests, typed inputs, least-privilege downstream access, output filtering, audit trails, rate limits, and active security testing. The server is not a shortcut around enterprise controls. It is where those controls should be made explicit for AI.&lt;/p&gt;

&lt;p&gt;There is also a larger industry lesson here. We have already spent decades learning, often painfully, how vulnerable data-facing interfaces become when teams trust inputs too much, over-privilege backends, skip per-request authorization, or expose broad attack surfaces and hope the client behaves well. SQL injection, broken access control, credential leakage, over-broad service identities, tenant isolation failures, and data exfiltration were not abstract risks. They were the history of web security. MCP is the next layer of interfaces between users, models, and data systems. It should start with those lessons, not relearn them from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Alternative Is Not "MCP vs. No MCP"
&lt;/h2&gt;

&lt;p&gt;A lot of enterprise security discussions start from the wrong comparison. The practical comparison is not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP server&lt;/li&gt;
&lt;li&gt;nothing at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real comparison is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a governed MCP server&lt;/li&gt;
&lt;li&gt;business users pasting or uploading sensitive material into a general AI chat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a user uploads a spreadsheet directly, the organization loses most of its control points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no tracing of how fresh the data is&lt;/li&gt;
&lt;li&gt;no typed or validated input contract&lt;/li&gt;
&lt;li&gt;no server-side output shaping&lt;/li&gt;
&lt;li&gt;no per-tool authorization boundary&lt;/li&gt;
&lt;li&gt;no guaranteed row-level or field-level policy enforcement&lt;/li&gt;
&lt;li&gt;no reliable audit trail of which business operation was governed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An MCP server restores those control points. Instead of exposing "the file," it can expose a tool such as &lt;code&gt;quarterly_sales_summary&lt;/code&gt;, &lt;code&gt;customer_health_report&lt;/code&gt;, or &lt;code&gt;query_sales_cube&lt;/code&gt;, each with deliberately constrained inputs and outputs. That is the security benefit of MCP in enterprise settings: not secrecy by obscurity, but governance by interface design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Starts With Tool Design
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc"&gt;first article&lt;/a&gt; argued that tool design is MCP's UX discipline. It is also one of MCP's first security layers.&lt;/p&gt;

&lt;p&gt;The business analyst has a direct security role at design time. They decide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which business actions should exist as tools at all&lt;/li&gt;
&lt;li&gt;which inputs are valid for those actions&lt;/li&gt;
&lt;li&gt;which outputs the model actually needs&lt;/li&gt;
&lt;li&gt;which fields should never be returned&lt;/li&gt;
&lt;li&gt;which workflows should be packaged as prompts rather than left to ad hoc exploration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because most enterprise data exposure happens long before a cryptographic control fails. It happens when the wrong interface is published.&lt;/p&gt;

&lt;p&gt;If the analyst defines a tool as "run any SQL against finance," the server surface is already too broad. If they define it as "summarize revenue by region for a selected quarter" with an enum for region scope, a validated quarter field, and a fixed output shape, most of the risk is removed before runtime.&lt;/p&gt;

&lt;p&gt;That is the same balance we have used throughout this series. The analyst contributes domain judgment. The server contributes deterministic enforcement. The model contributes to language understanding. The user contributes the concrete business question. Security improves because the parties are not all doing the same job badly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typed Inputs Are A Security Boundary
&lt;/h2&gt;

&lt;p&gt;One useful pattern, and one that Rust SDKs such as PMCP support well, is to make tool contracts both model-readable and runtime-enforced from the same Rust types.&lt;/p&gt;

&lt;p&gt;In earlier articles, we used &lt;code&gt;schemars&lt;/code&gt; constraints and &lt;code&gt;deny_unknown_fields&lt;/code&gt; to improve tool usability. Those same patterns are security controls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;type safety rejects malformed inputs before business logic runs&lt;/li&gt;
&lt;li&gt;range and length constraints reject obviously abusive input early&lt;/li&gt;
&lt;li&gt;enums narrow the space of valid values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;deny_unknown_fields&lt;/code&gt; prevents undeclared parameters from slipping through.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not the whole security story, but it is an important first layer. A business analyst can define the valid business shape, and the server implementation can enforce it consistently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[derive(Debug,&lt;/span&gt; &lt;span class="nd"&gt;Deserialize,&lt;/span&gt; &lt;span class="nd"&gt;JsonSchema)]&lt;/span&gt;
&lt;span class="nd"&gt;#[schemars(deny_unknown_fields)]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="n"&gt;SalesSummaryInput&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="cd"&gt;/// Fiscal quarter to summarize, for example, 2026-Q1&lt;/span&gt;
    &lt;span class="nd"&gt;#[schemars(length(min&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt; &lt;span class="nd"&gt;max&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;quarter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="cd"&gt;/// Business unit to analyze&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;business_unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BusinessUnit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="cd"&gt;/// Aggregation granularity&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;group_by&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GroupBy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="cd"&gt;/// Maximum number of rows to return&lt;/span&gt;
    &lt;span class="nd"&gt;#[schemars(range(min&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nd"&gt;,&lt;/span&gt; &lt;span class="nd"&gt;max&lt;/span&gt; &lt;span class="nd"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="nd"&gt;))]&lt;/span&gt;
    &lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Option&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is more than schema decoration. It is the business contract. The LLM sees the same structure during tool discovery that the server enforces at runtime.&lt;/p&gt;

&lt;p&gt;This is also where the business analyst's role becomes concrete. They are not writing Rust, but they are deciding that quarters should follow a fiscal format, that &lt;code&gt;group_by&lt;/code&gt; should be an enum rather than free text, and that a sales summary tool should never return an unbounded result set. The engineer implements that contract in code. A good MCP server then turns it into a discoverable and enforceable interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Output Boundaries Matter More Than Most Teams Think
&lt;/h2&gt;

&lt;p&gt;Enterprises often focus on who may call a tool and underinvest in what the tool can return. For AI workloads, output boundaries are just as important.&lt;/p&gt;

&lt;p&gt;This is one reason MCP is safer than directly uploading documents. The server can expose a shaped result rather than raw records. It can return aggregates instead of rows. It can omit PII fields entirely. It can redact or mask values that are useful for joins or filtering, but should never be emitted back into the model context.&lt;/p&gt;

&lt;p&gt;That distinction matters in the long-tail cases covered by the &lt;a href="https://dev.to/aws-heroes/code-mode-for-mcp-the-long-tail-escape-hatch-not-the-front-door-40ga"&gt;code mode article&lt;/a&gt;. A database query may legitimately need sensitive fields for internal computation or joins, while still forbidding those fields from appearing in the result.&lt;/p&gt;

&lt;p&gt;For example, a code-mode policy can allow a query to join the customer and support tables on the server side, while blocking sensitive output fields such as &lt;code&gt;ssn&lt;/code&gt;, &lt;code&gt;salary&lt;/code&gt;, or even raw email addresses from appearing in the returned payload. The sensitive field can participate in the computation. It does not have to be exposed to the model.&lt;/p&gt;

&lt;p&gt;That is a much better enterprise pattern than "upload the spreadsheet and ask the model to be careful."&lt;/p&gt;

&lt;p&gt;The same principle applies outside code mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use fixed output schemas for curated tools. You want to control the shape of the data returned by the data system to optimize AI flows for security, privacy, and cost (fewer tokens).&lt;/li&gt;
&lt;li&gt;Prefer aggregates over row dumps. Don't trust the LLM to accurately calculate &lt;code&gt;sum&lt;/code&gt; or other statistical measures. The MCP tools are much better for such symbolic computation. &lt;/li&gt;
&lt;li&gt;Keep tool responses aligned to the actual business question for consistency, accuracy, and security. &lt;/li&gt;
&lt;li&gt;block fields that are unnecessary for the answer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Security is not only about stopping bad requests. It is also about making oversharing hard by design. As a side effect, this usually saves tokens and processing time too.&lt;/p&gt;

&lt;h2&gt;
  
  
  OAuth Matters Because The Server Must Act On Behalf Of The User
&lt;/h2&gt;

&lt;p&gt;This is the most important point of authentication when building AI agentics workflows. &lt;/p&gt;

&lt;p&gt;An enterprise MCP server should not sit in front of a database or application using a single shared application API key, a single database username/password, or a single generic service identity that grants every end user the same access. That recreates the oldest enterprise security mistake: every user gets the power of the integration account.&lt;/p&gt;

&lt;p&gt;The correct pattern is that the MCP client and MCP server work on behalf of the authenticated user.&lt;/p&gt;

&lt;p&gt;That is why OAuth 2.0 and OIDC matter so much in a serious MCP security model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The MCP client handles the OAuth flow and token refresh, allowing the users to log in once and then manage the secure handling of the access and refresh tokens.&lt;/li&gt;
&lt;li&gt;The MCP server validates the access token on every request and extracts user identity, tenant context, groups, and scopes.&lt;/li&gt;
&lt;li&gt;downstream systems enforce the user's own permissions whenever possible, with the pass-through of the user's access tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the strongest design, that delegated identity continues past the MCP boundary. If the downstream API or data platform supports OAuth, the MCP server should forward the user's token rather than substitute a broad application credential. If the backend uses a different enforcement model, the server should still propagate token-derived user and tenant context into that system so row-level, field-level, or tenant-level policies execute for the real user. The point is to preserve user identity end-to-end, not collapse it into a shared super-account in the middle.&lt;/p&gt;

&lt;p&gt;The practical benefit is huge. When a company already uses Entra ID, Okta, Cognito, Auth0, or another identity provider, the MCP server can integrate with existing SSO, group membership, access reviews, and offboarding. When IT disables an employee account, MCP access is revoked.&lt;/p&gt;

&lt;p&gt;That is categorically better than a static API key model.&lt;/p&gt;

&lt;p&gt;In a well-factored implementation, the server code can stay provider-agnostic and work with an &lt;code&gt;AuthContext&lt;/code&gt; rather than baking identity-provider details into every tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;handle_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;AuthContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="nf"&gt;.require_auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="nf"&gt;.require_scope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"read:sales"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="nf"&gt;.user_id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="nf"&gt;.tenant_id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Pass identity downstream so backend policies act on behalf of the user&lt;/span&gt;
    &lt;span class="nn"&gt;tracing&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nd"&gt;info!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"authorized sales request"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The design principle underneath this is simple: &lt;strong&gt;the server should validate identity, not replace identity&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Validation Is Not Optional Plumbing
&lt;/h2&gt;

&lt;p&gt;Because MCP servers follow the web server model, token validation must be treated as a first-class responsibility of the server.&lt;/p&gt;

&lt;p&gt;For JWT access tokens, that means validating at least:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;signature&lt;/li&gt;
&lt;li&gt;algorithm&lt;/li&gt;
&lt;li&gt;expiration&lt;/li&gt;
&lt;li&gt;not-before time&lt;/li&gt;
&lt;li&gt;issuer&lt;/li&gt;
&lt;li&gt;audience&lt;/li&gt;
&lt;li&gt;required scopes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The principle is straightforward. The MCP server should not invent a custom auth flow. It should validate the tokens it receives and return a stable authenticated context for the rest of the codebase.&lt;/p&gt;

&lt;p&gt;This is another place where mature SDKs help enterprise teams. A provider-agnostic authentication model lets you switch between Cognito, Entra, Google, Okta, and Auth0 by configuring the tool rather than rewriting its logic. That keeps authentication a deployment concern rather than scattering auth conditionals throughout the business code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Happens In Layers
&lt;/h2&gt;

&lt;p&gt;Many teams miss an important point: you do not need to implement every security rule inside the MCP server itself. You need to place each rule in the correct layer.&lt;/p&gt;

&lt;p&gt;For enterprise MCP, a clean mental model is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Layer 1: Server access&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Validate the token. Reject invalid, expired, or misissued requests.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Layer 2: Tool authorization&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Check whether this user may call this tool or workflow at all.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Layer 3: Data-level security in the backend&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Let the database, API gateway, GraphQL layer, or data platform enforce row-level security, field-level authorization, column masking, or tenant isolation.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the design you want because the MCP server is the interface tier, not the entire security platform. If your warehouse already supports column masking, or your database already supports row-level security, the MCP server should pass through the user identity and let the backend do what it is already good at.&lt;/p&gt;

&lt;p&gt;This is also the answer to a common enterprise objection: "Do we need to reimplement all our data security logic in the AI layer?" No. The AI-facing layer should validate access, constrain the interface, and carry the user identity through. The data systems should continue enforcing the data rules closest to the data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Testing Is Another Layer, Not A Final Checkbox
&lt;/h2&gt;

&lt;p&gt;Security controls designed into the server still need to be tested as part of the release process.&lt;/p&gt;

&lt;p&gt;That is where the &lt;a href="//../03-testing/article.md"&gt;testing article&lt;/a&gt; fits directly into the security story. We argued there that MCP production testing has five gates: smoke, conformance, scenarios, load, and pentest. Security is not separate from that stack. It is one of the gates, and it also cuts through the others.&lt;/p&gt;

&lt;p&gt;For MCP workloads, pentesting matters because the client is programmable and partially adversarial by default. You are not only defending against a careless user. You are also defending against:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt-injection-shaped inputs&lt;/li&gt;
&lt;li&gt;malformed parameters&lt;/li&gt;
&lt;li&gt;tool misuse across role boundaries&lt;/li&gt;
&lt;li&gt;schema edge cases&lt;/li&gt;
&lt;li&gt;attempts to exfiltrate blocked fields&lt;/li&gt;
&lt;li&gt;tenant boundary violations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the enterprise posture should be layered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;design-time narrowing of tools and workflows&lt;/li&gt;
&lt;li&gt;typed validation at the interface&lt;/li&gt;
&lt;li&gt;OAuth-based user identity&lt;/li&gt;
&lt;li&gt;backend-enforced data permissions&lt;/li&gt;
&lt;li&gt;policy and approval for powerful surfaces&lt;/li&gt;
&lt;li&gt;penetration testing before production rollout and after meaningful changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is how you turn "secure by design" from a slogan into a release discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do Not Throw Away Web Security Lessons
&lt;/h2&gt;

&lt;p&gt;One of the easiest mistakes in AI infrastructure is to treat it as so new that older security practices no longer apply. That is usually how teams recreate old failures with new tooling.&lt;/p&gt;

&lt;p&gt;MCP servers should start from the hard-learned lessons of web and API security:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;never trust client input, even when the client is an LLM&lt;/li&gt;
&lt;li&gt;authenticate every request&lt;/li&gt;
&lt;li&gt;authorize every operation&lt;/li&gt;
&lt;li&gt;prefer least-privilege credentials and delegated user identity&lt;/li&gt;
&lt;li&gt;narrow the interface surface instead of wrapping the whole backends&lt;/li&gt;
&lt;li&gt;validate and shape outputs, not only inputs&lt;/li&gt;
&lt;li&gt;log and audit meaningful actions&lt;/li&gt;
&lt;li&gt;assume attackers will probe every exposed edge&lt;/li&gt;
&lt;li&gt;test for injection, exfiltration, broken access control, and tenant leaks before production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP changes the interface, but it does not repeal these rules. If anything, it makes some of them more important, because the caller is now a probabilistic system that can be induced to misuse the interface in unexpected ways.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Note: Rust Helps
&lt;/h2&gt;

&lt;p&gt;This article is about MCP best practices, not a specific single SDK. Still, language and tooling choices do matter. Rust is a strong fit for this layer because memory safety, strong typing, and explicit contracts are useful properties for security-sensitive interface services. SDKs such as PMCP are valuable when they make those practices easier to apply consistently, but the architectural lesson comes first: narrow interfaces, user-scoped identity, layered authorization, and governed outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Security Argument
&lt;/h2&gt;

&lt;p&gt;The core argument of this article is not that MCP automatically makes AI safe. Poorly designed MCP servers can absolutely be insecure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A well-designed MCP server is a more secure enterprise AI pattern than allowing users to upload raw business documents to general AI chat interfaces.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can limit the interface to approved business operations.&lt;/li&gt;
&lt;li&gt;It can execute symbolic computation on the server side without exposing internal data to the outside world.&lt;/li&gt;
&lt;li&gt;It can validate inputs before execution.&lt;/li&gt;
&lt;li&gt;It can shape outputs before data reaches the model.&lt;/li&gt;
&lt;li&gt;It can block sensitive fields from being returned.&lt;/li&gt;
&lt;li&gt;It can act on behalf of the authenticated user instead of a shared super-account.&lt;/li&gt;
&lt;li&gt;It can preserve existing backend security controls.&lt;/li&gt;
&lt;li&gt;It can be tested, audited, and governed like any other production interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Security Stack In One View
&lt;/h2&gt;

&lt;p&gt;To wrap the article up, here is the layered model worth carrying forward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Business design layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The business analyst defines the approved operations, shared business definitions, and safe aggregation rules so users do not reinvent them in ad hoc prompts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Interface contract layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Tools expose typed inputs, bounded schemas, constrained outputs, and narrow outcome-oriented surfaces.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authentication layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The MCP client and server work on behalf of the authenticated user via OAuth and validated tokens, rather than shared integration credentials.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authorization layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Each tool, workflow, or code-mode action is checked against the user's scopes, roles, groups, and policy rules.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data protection layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Backend systems enforce row-level security, field-level permissions, masking, tenant isolation, and other controls closest to the data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Governance layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Powerful surfaces, such as code mode, add approval, policy administration, blocked fields, execution limits, and audit trails.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Validation and testing layer&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Smoke tests, conformance tests, scenario tests, load tests, and penetration tests verify that the security design actually holds in production.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If enterprise teams start MCP with those seven layers, they will begin with the lessons the industry has already paid to learn, rather than paying for them again through avoidable AI-era security failures.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>security</category>
    </item>
    <item>
      <title>Redshift Check-In: Spring 2026</title>
      <dc:creator>elliott cordo</dc:creator>
      <pubDate>Thu, 04 Jun 2026 21:26:55 +0000</pubDate>
      <link>https://dev.to/aws-heroes/redshift-check-in-spring-2026-14cg</link>
      <guid>https://dev.to/aws-heroes/redshift-check-in-spring-2026-14cg</guid>
      <description>&lt;p&gt;I have a soft spot for Amazon Redshift.  &lt;/p&gt;

&lt;p&gt;One of my early clients was a pre-GA adopter, potentially a  "Client #1" on the platform.   We were mid-implementation Paraccel, the engine behind Redshift's early years, and decided to quickly pivot to this new shiny warehouse in the cloud.   &lt;/p&gt;

&lt;p&gt;At this time, cloud-based data warehouse engines were unheard of.   Luckily my client was a scrappy media startup, with a tremendous amount of data, and a large aging Hadoop cluster, so they were willing to take the risk.  Long story short, we had some early adopter hiccups but overall it went amazingly well, and the platform is still alive and well today!&lt;/p&gt;

&lt;p&gt;Nearly 14 years later, I thought it would be interesting to step back and ask a simple question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;How is Redshift doing?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is surprisingly well.&lt;/p&gt;

&lt;p&gt;Although Redshift is certainly not the only cloud data warehouse, and attention has shifted toward lakehouses, open table formats, and AI platforms, Redshift has quietly continued to evolve. In many ways, the service has transformed from a cloud data warehouse into a broader analytics and AI platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What Has Changed Since The Early Days?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The original Redshift value proposition was straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fast SQL analytics
&lt;/li&gt;
&lt;li&gt;Managed infrastructure
&lt;/li&gt;
&lt;li&gt;Commodity cloud economics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was enough to disrupt traditional data warehousing.&lt;/p&gt;

&lt;p&gt;Today's Redshift is solving a very different set of problems. Modern data teams are less concerned with standing up clusters and more concerned with operational complexity, data movement, governance, and AI enablement.&lt;/p&gt;

&lt;p&gt;AWS has spent the last several years addressing those concerns directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Serverless Has Become The Default&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;One of the biggest shifts has been the maturation of Redshift Serverless.&lt;/p&gt;

&lt;p&gt;In the early years, Redshift administration was a significant skillset. Teams spent time sizing clusters, planning node counts, managing concurrency, and tuning workloads.&lt;/p&gt;

&lt;p&gt;Serverless dramatically changes that operating model by removing most infrastructure decisions from the equation. More recently, AWS introduced AI-driven scaling and optimization capabilities that automatically adjust resources based on workload patterns and query characteristics.&lt;/p&gt;

&lt;p&gt;For many organizations, Redshift is no longer something that needs to be actively managed.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Rise Of Zero-ETL&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Perhaps the most strategically important development has been AWS's investment in Zero-ETL integrations.&lt;/p&gt;

&lt;p&gt;Historically, a significant portion of data engineering effort was spent moving data from operational systems into analytical systems. Today, Redshift can consume near real-time data from Aurora, RDS, DynamoDB, and other AWS services through managed integrations.&lt;/p&gt;

&lt;p&gt;As someone who has spent decades building data pipelines, I find this trend particularly interesting.   And although I have concerns about potential anti-patterns such as tight-coupling, I’m largely happy to avoid the undifferentiated work of moving data from point A to point B. &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Redshift Is Becoming More AI Native&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Another notable shift is the growing integration between Redshift and generative AI services.&lt;/p&gt;

&lt;p&gt;Redshift now supports Amazon Q assisted SQL generation, helping users author and understand queries through natural language. AWS has also integrated Amazon Bedrock capabilities directly into the platform, allowing organizations to leverage foundation models closer to their data.&lt;/p&gt;

&lt;p&gt;This reflects a broader trend across the industry:&lt;/p&gt;

&lt;p&gt;The analytical warehouse is no longer just where data lives. It is increasingly where AI is applied.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Lakehouse Conversation&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Several years ago, many observers predicted that open lakehouse architectures would make traditional data warehouses obsolete.&lt;/p&gt;

&lt;p&gt;Instead, something more interesting happened.&lt;/p&gt;

&lt;p&gt;Redshift evolved to participate in that ecosystem rather than compete against it. AWS has continued investing in Iceberg compatibility, open data architectures, and tighter integration across analytical services. The distinction between warehouse and lake continues to blur.&lt;/p&gt;

&lt;p&gt;The result is that organizations increasingly have flexibility in how they store data while still leveraging Redshift's query engine, governance capabilities, and performance optimizations.&lt;/p&gt;

&lt;p&gt;My general thesis is that eventually &lt;em&gt;everything will be Iceberg&lt;/em&gt;, and it feels like the Redshift team is in on this bet.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Operational Maturity Story&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;One thing that often gets overlooked is how much operational maturity Redshift has accumulated.&lt;/p&gt;

&lt;p&gt;Features like automatic table optimization, materialized view improvements, federated permissions, enhanced security defaults, and continual query engine optimizations may not generate headlines, but they matter tremendously in production environments.&lt;/p&gt;

&lt;p&gt;Many of these capabilities address lessons learned from operating thousands of customer deployments over more than a decade.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;A Few Things To Watch&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Not every change is additive.&lt;/p&gt;

&lt;p&gt;Organizations running older Redshift environments should be aware of the ongoing retirement of Python UDFs, with AWS encouraging migration toward Lambda UDFs and external service integrations.&lt;/p&gt;

&lt;p&gt;Similarly, teams adopting Serverless should continue to monitor workload economics carefully. Simpler operations do not eliminate the need for cost governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Final Thoughts&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you had asked me in 2012 what Redshift would look like in 2026, I would have guessed larger clusters, faster hardware, and lower costs.&lt;/p&gt;

&lt;p&gt;What actually happened is more interesting.&lt;/p&gt;

&lt;p&gt;Redshift evolved from a cloud data warehouse into a broader analytical platform that increasingly sits at the intersection of data, governance, and AI.&lt;/p&gt;

&lt;p&gt;The core promise remains largely unchanged: help organizations derive value from their data.&lt;/p&gt;

&lt;p&gt;The difference is that today's Redshift spends far less time asking engineers to manage infrastructure and far more time helping them solve business problems.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>5 Multi-Agent Patterns in Strands Agents: Which One and When</title>
      <dc:creator>ricardoceci</dc:creator>
      <pubDate>Wed, 03 Jun 2026 11:47:48 +0000</pubDate>
      <link>https://dev.to/aws-heroes/5-multi-agent-patterns-in-strands-agents-which-one-and-when-48gh</link>
      <guid>https://dev.to/aws-heroes/5-multi-agent-patterns-in-strands-agents-which-one-and-when-48gh</guid>
      <description>&lt;p&gt;You have an agent that searches flights. Another one checks the weather. Another one enforces corporate policies. How do you make them work together?&lt;/p&gt;

&lt;p&gt;Strands Agents offers 5 patterns for coordinating multiple agents. Each one solves a different problem. The key difference: &lt;strong&gt;who decides the execution order&lt;/strong&gt;. The model, the agents themselves, or you in code.&lt;/p&gt;

&lt;p&gt;In this post I walk through each pattern with code, show the differences in a comparison table, and give you a decision framework to pick the right one.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;All examples use Strands Agents (June 2026). The complete code is at &lt;a href="https://github.com/ricardoceci/curso-strands-agentcore-2026/tree/main/clase-3" rel="noopener noreferrer"&gt;github.com/ricardoceci/curso-strands-agentcore-2026/tree/main/clase-3&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Running Example: Corporate Travel Agent
&lt;/h2&gt;

&lt;p&gt;Every example uses the same case: a corporate travel agent that coordinates flight search, weather lookup, and recommendation generation. Three capabilities, three specialized agents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search one-way flights via Duffel sandbox.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# ... Duffel API call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;route&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;origin&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;offers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]}&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get weather forecast for a city on a specific date.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# ... Open-Meteo API call
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_temp_c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;precipitation_mm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pattern 1: Agents as Tools
&lt;/h2&gt;

&lt;p&gt;An orchestrator agent uses other agents as if they were tools. The orchestrator decides when to delegate and to whom.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;You pass the sub-agent directly in the main agent's &lt;code&gt;tools&lt;/code&gt; array. The SDK converts it to a tool automatically. When the orchestrator's model decides it needs that capability, it invokes the sub-agent. Everything runs in the same Python process.&lt;/p&gt;

&lt;p&gt;For more control, use &lt;code&gt;.as_tool()&lt;/code&gt; to customize the tool name and description, or the &lt;code&gt;@tool&lt;/code&gt; decorator to wrap the call entirely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Specialized sub-agent for weather
&lt;/span&gt;&lt;span class="n"&gt;weather_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a weather expert. Respond concisely.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The main agent receives the sub-agent directly in tools[]
&lt;/span&gt;&lt;span class="n"&gt;travel_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-- the SDK converts it to a tool
&lt;/span&gt;    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a corporate travel assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;travel_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Flights from EZE to MIA on Aug 20. What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather like?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;When you have a few sub-agents with clearly distinct capabilities and you want the main model to decide when to call each one. This is the most direct pattern. It requires the least coordination.&lt;/p&gt;

&lt;h3&gt;
  
  
  When NOT to Use It
&lt;/h3&gt;

&lt;p&gt;If you need agents to communicate with each other (not only with the orchestrator) or if execution order matters. In those cases, Graph or Swarm work better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: Swarm
&lt;/h2&gt;

&lt;p&gt;A group of agents that coordinate autonomously through handoffs (control transfers). Each agent decides when to pass work to another.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Strands equips each agent in the Swarm with coordination tools: a handoff tool to transfer control, and a shared context that all agents can read. The agents decide the execution order themselves.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Swarm&lt;/span&gt;

&lt;span class="n"&gt;flight_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flight_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_flights&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You search flights. When done, hand off to weather_agent.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;weather_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You check weather at the destination.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;summary_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You combine flights and weather into a clear recommendation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;swarm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Swarm&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;flight_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_agent&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;swarm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I need to travel from Buenos Aires to Miami on August 20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;When the problem breaks down into parts that different specialists handle better, and the order can vary by task. Swarm works best for problems where you don't know in advance which agent should act first.&lt;/p&gt;

&lt;h3&gt;
  
  
  When NOT to Use It
&lt;/h3&gt;

&lt;p&gt;If you need a predictable, repeatable execution flow. Swarm decides the order at runtime. Two runs of the same prompt can follow different paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 3: Graph
&lt;/h2&gt;

&lt;p&gt;A directed graph where each node is an agent and edges define the execution flow. You define the structure, the framework executes in order.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;GraphBuilder&lt;/code&gt; gives you an API to define nodes (agents) and edges (connections). The framework passes output from one node as input to the next. It supports acyclic graphs (pipelines) and cyclic graphs (refinement loops), giving you flexibility to implement review iterations between agents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GraphBuilder&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;GraphBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flight_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summary_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Travel from Buenos Aires to Miami, August 20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;When the workflow has a strict order that should not change. For example: always search flights first, then check weather, then generate the recommendation. If the pipeline is the same every time, Graph is the right pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  When NOT to Use It
&lt;/h3&gt;

&lt;p&gt;If you need flexibility in execution order. A Graph with many depth levels also adds latency, because each node waits for the previous one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 4: Workflow
&lt;/h2&gt;

&lt;p&gt;A task graph with explicit dependencies and automatic parallel execution. Each task is assigned to an agent with a specific system_prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;Unlike Graph (which operates with agents as nodes), Workflow operates with &lt;strong&gt;tasks&lt;/strong&gt;. Each task has an ID, description, dependencies, and priority. The framework resolves execution order, parallelizes what it can, and passes results between tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;create&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;travel_planning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_flights&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the 5 cheapest flights from EZE to MIA on Aug 20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a flight search expert.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check weather in Miami for August 20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a weather expert.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Combine flights and weather into an executive report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dependencies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_flights&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a corporate travel analyst.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execute&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflow_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;travel_planning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example, &lt;code&gt;search_flights&lt;/code&gt; and &lt;code&gt;check_weather&lt;/code&gt; run in parallel (no dependencies between them). &lt;code&gt;generate_report&lt;/code&gt; waits for both to finish.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;When you have a repeatable process with steps that can run in parallel. Workflow is the closest pattern to a traditional data pipeline: explicit dependencies, deterministic execution, results flowing between tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  When NOT to Use It
&lt;/h3&gt;

&lt;p&gt;If you need agents to make decisions about what to do next. Workflow executes exactly what you defined, with no autonomy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 5: A2A (Agent-to-Agent)
&lt;/h2&gt;

&lt;p&gt;An open protocol (created by Google, now at the Linux Foundation) for agents to communicate over HTTP. Agents can be written in different frameworks, on different servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;An agent is exposed as an A2A server with an Agent Card (JSON metadata at &lt;code&gt;/.well-known/agent-card.json&lt;/code&gt;). Another agent consumes it as a client. Communication happens over HTTP/JSON.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.multiagent.a2a&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;A2AServer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;

&lt;span class="n"&gt;weather_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Checks weather forecasts by city and date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;a2a_server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;A2AServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;weather_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a2a_server&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_fastapi_app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Client:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.agent.a2a_agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;A2AAgent&lt;/span&gt;

&lt;span class="n"&gt;remote_weather&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;A2AAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use it as a node in a Graph
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;GraphBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flight_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;remote_weather&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use It
&lt;/h3&gt;

&lt;p&gt;When agents live on different servers, are written in different frameworks (Strands, LangGraph, CrewAI), or belong to different teams or organizations. A2A is the only pattern that crosses the local process boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  When NOT to Use It
&lt;/h3&gt;

&lt;p&gt;If all agents run in the same process. A2A adds network latency (50-1000ms per call, per &lt;a href="https://medium.com/@rasid2006/understanding-communication-patterns-in-strands-agents-framework-3e53d7918182" rel="noopener noreferrer"&gt;community benchmarks&lt;/a&gt;). For local communication, the other patterns are orders of magnitude faster.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing the 5 Multi-Agent Patterns
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Agents as Tools&lt;/th&gt;
&lt;th&gt;Swarm&lt;/th&gt;
&lt;th&gt;Graph&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;A2A&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Who decides order&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Orchestrator model&lt;/td&gt;
&lt;td&gt;Agents (handoffs)&lt;/td&gt;
&lt;td&gt;You (edges)&lt;/td&gt;
&lt;td&gt;You (dependencies)&lt;/td&gt;
&lt;td&gt;N/A (protocol)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Communication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In-process&lt;/td&gt;
&lt;td&gt;In-process&lt;/td&gt;
&lt;td&gt;In-process&lt;/td&gt;
&lt;td&gt;In-process&lt;/td&gt;
&lt;td&gt;HTTP/JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Microseconds&lt;/td&gt;
&lt;td&gt;Microseconds&lt;/td&gt;
&lt;td&gt;Microseconds&lt;/td&gt;
&lt;td&gt;Microseconds&lt;/td&gt;
&lt;td&gt;50-1000ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parallel execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not native&lt;/td&gt;
&lt;td&gt;Agent-decided&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cyclic graphs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Composable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (as tool)&lt;/td&gt;
&lt;td&gt;Yes (Graph node)&lt;/td&gt;
&lt;td&gt;Yes (nested Graph)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Graph node)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Decision Framework
&lt;/h2&gt;

&lt;p&gt;When choosing a pattern, follow these questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are the agents in the same process?&lt;/strong&gt;&lt;br&gt;
If not: &lt;strong&gt;A2A&lt;/strong&gt; (the only one that crosses the network).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does execution order matter?&lt;/strong&gt;&lt;br&gt;
If yes, is it always the same?: &lt;strong&gt;Graph&lt;/strong&gt; (fixed structure).&lt;br&gt;
If yes, with parallel tasks?: &lt;strong&gt;Workflow&lt;/strong&gt; (dependencies + parallelism).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do agents need to decide on their own who acts?&lt;/strong&gt;&lt;br&gt;
If yes: &lt;strong&gt;Swarm&lt;/strong&gt; (autonomous handoffs).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it direct delegation from an orchestrator to specialists?&lt;/strong&gt;&lt;br&gt;
If yes: &lt;strong&gt;Agents as Tools&lt;/strong&gt; (the most direct).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;None fits perfectly?&lt;/strong&gt;&lt;br&gt;
Patterns compose with each other. A Graph can have a Swarm as a node. An agent with tools can include a remote A2AAgent. Strands lets you mix patterns without constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Swarm use A2A internally?
&lt;/h3&gt;

&lt;p&gt;No. Swarm uses Python function calls within the same process. The confusion comes from both involving "communication between agents," but Swarm is local (microseconds) and A2A is remote (HTTP).&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Graph support conditional execution?
&lt;/h3&gt;

&lt;p&gt;Yes. Edges can have conditions evaluated at runtime. This gives you graphs that behave like decision trees: based on a node's output, the flow takes one path or another.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I combine patterns?
&lt;/h3&gt;

&lt;p&gt;Yes, and this is one of Strands' strengths. A Swarm can live as a node inside a Graph. A Graph can use a remote A2AAgent as a node. An agent with tools can include another agent as a tool. Composition is unrestricted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which one is most token-efficient?
&lt;/h3&gt;

&lt;p&gt;Agents as Tools and Graph consume fewer tokens because coordination is deterministic. Swarm can consume more because agents reason about who to hand off to. A2A adds HTTP protocol overhead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Workflow replace Graph?
&lt;/h3&gt;

&lt;p&gt;No. Workflow is a Strands tool (from &lt;code&gt;strands-agents-tools&lt;/code&gt;), while Graph is a native SDK orchestrator. Workflow operates with tasks, Graph operates with agents. If you need each node to be a full agent with its own system prompt and tools, use Graph. If you need a task pipeline with dependencies, use Workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The 5 patterns don't compete with each other. Each one solves a different coordination need. The key is to start with the most direct one that works for your case (probably Agents as Tools) and scale toward more complex patterns when you need them.&lt;/p&gt;

&lt;p&gt;If you want to see these patterns implemented live with the Corporate Travel Agent case, the full code is in the course repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ricardoceci/curso-strands-agentcore-2026" rel="noopener noreferrer"&gt;github.com/ricardoceci/curso-strands-agentcore-2026&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://strandsagents.com/docs/user-guide/concepts/multi-agent/multi-agent-patterns/" rel="noopener noreferrer"&gt;Strands Multi-Agent Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/multi-agent-collaboration-patterns-with-strands-agents-and-amazon-nova/" rel="noopener noreferrer"&gt;Multi-Agent Collaboration Patterns with Strands and Amazon Nova (AWS Blog)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.strandsagents.com/" rel="noopener noreferrer"&gt;Strands Agents Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/google/A2A" rel="noopener noreferrer"&gt;A2A Protocol (Linux Foundation)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/introducing-agent-to-agent-protocol-support-in-amazon-bedrock-agentcore-runtime/" rel="noopener noreferrer"&gt;A2A Protocol Support in AgentCore Runtime (AWS Blog)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>Meet Deliberation: 400+ models is easy, knowing which ones earn a place is hard.</title>
      <dc:creator>Anton Babenko</dc:creator>
      <pubDate>Wed, 03 Jun 2026 08:54:04 +0000</pubDate>
      <link>https://dev.to/aws-heroes/meet-deliberation-400-models-is-easy-knowing-which-ones-earn-a-place-is-hard-4o5g</link>
      <guid>https://dev.to/aws-heroes/meet-deliberation-400-models-is-easy-knowing-which-ones-earn-a-place-is-hard-4o5g</guid>
      <description>&lt;p&gt;&lt;em&gt;A follow-up: how a three-model consensus tool grew into a configurable, measurable panel - and why I now make every model prove it pays for its slot.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It is open source and ready to use today: &lt;strong&gt;&lt;a href="https://github.com/antonbabenko/deliberation" rel="noopener noreferrer"&gt;github.com/antonbabenko/deliberation&lt;/a&gt;&lt;/strong&gt;. If you only do one thing, star the repo and install it in your own agent - it takes about two minutes, and the install section below has the exact command for Claude Code, Codex, Cursor, Kiro, and OpenCode.&lt;/p&gt;

&lt;p&gt;Here is the part that surprises people: you are probably already paying to access a few models. A ChatGPT subscription includes Codex which allows you to use models like &lt;code&gt;gpt-5.5&lt;/code&gt; and &lt;code&gt;gpt-5.3-codex&lt;/code&gt; from Codex CLI. A Claude subscription includes Claude Code. So you can wire GPT and Claude to review each other right now, at no extra cost, and add Gemini, Grok, or any OpenRouter model when you want a third opinion.&lt;/p&gt;

&lt;p&gt;A few weeks ago I wrote that one model is a guess and three that agree is a plan. I still believe it. But it skipped the next problem, the one I hit the moment the trick worked: once you can ask three models, you can ask thirty seven. And more voices is not the same as more signal. A slow model that always agrees with a faster one is not a second opinion. It is a bill.&lt;/p&gt;

&lt;p&gt;So the project grew up, and it got a name: &lt;strong&gt;Deliberation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The word fits. Deliberation is the slow, careful weighing of a decision - the thing a jury or a council does before it returns a verdict. That is exactly the job here: a panel of models that weighs one artifact and argues its way to a call, instead of one model blurting the first thing that sounds right. It answers a different question than the first post did. The first post asked &lt;em&gt;should you make models argue?&lt;/em&gt; This one is about &lt;em&gt;which models, under what rules, and how do you know any of it was worth the wait?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It still runs the everyday work of &lt;a href="https://compliance.tf" rel="noopener noreferrer"&gt;compliance.tf&lt;/a&gt;, same as before. Most of what follows came from that: not from theory, from watching real runs and asking why one took four minutes to tell me what another said in twenty seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  A few words first
&lt;/h2&gt;

&lt;p&gt;If you are new to this, here are the only terms you need. Plain version, with examples.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent host.&lt;/strong&gt; The tool you already code in: Claude Code, the Codex CLI, Cursor, Kiro, OpenCode. Deliberation plugs into all of them and behaves the same way in each.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP.&lt;/strong&gt; A standard plug that lets your agent host talk to outside tools. Deliberation is one MCP server, so any host that speaks MCP can use it. You install it once; you do not write glue code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider / model.&lt;/strong&gt; A provider is a company (OpenAI, Google, xAI, OpenRouter). A model is one brain from that provider (GPT, Gemini, Grok, or any of OpenRouter's 400-plus, like Qwen, DeepSeek, or Kimi).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Panel.&lt;/strong&gt; The set of models you ask at once. You choose who is on it. Example: a panel of GPT + Gemini + Grok.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expert (or persona).&lt;/strong&gt; A hat a model wears for one job: Architect, Security Analyst, Code Reviewer, Plan Reviewer, and three more. The same model reviews differently depending on the hat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arbiter.&lt;/strong&gt; The one who reads everybody's answers and makes the call. It can be a model, or it can be your own agent (you).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the whole vocabulary. The rest is just rules for how the panel talks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install it now
&lt;/h2&gt;

&lt;p&gt;Deliberation is one MCP server with native packaging for each host. Set only the providers you use - a missing key just turns that one provider off. Full per-host guides: &lt;strong&gt;&lt;a href="https://github.com/antonbabenko/deliberation/tree/master/public-docs/hosts" rel="noopener noreferrer"&gt;public-docs/hosts&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code:&lt;/strong&gt; add the marketplace, install the plugin, run setup.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  /plugin marketplace add antonbabenko/agent-plugins
  /plugin &lt;span class="nb"&gt;install &lt;/span&gt;deliberation
  /deliberation:setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codex CLI:&lt;/strong&gt; &lt;code&gt;codex plugin marketplace add antonbabenko/deliberation&lt;/code&gt;, then install &lt;strong&gt;deliberation&lt;/strong&gt; from &lt;code&gt;/plugins&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor:&lt;/strong&gt; drop in the rule file and use the one-click MCP deeplink (&lt;a href="https://github.com/antonbabenko/deliberation/blob/master/public-docs/hosts/cursor.md" rel="noopener noreferrer"&gt;guide&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro:&lt;/strong&gt; "Import power from GitHub" (&lt;a href="https://github.com/antonbabenko/deliberation/blob/master/public-docs/hosts/kiro.md" rel="noopener noreferrer"&gt;guide&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenCode:&lt;/strong&gt; add the &lt;code&gt;opencode.json&lt;/code&gt; MCP snippet (&lt;a href="https://github.com/antonbabenko/deliberation/blob/master/public-docs/hosts/opencode.md" rel="noopener noreferrer"&gt;guide&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Credentials come from your environment: GPT uses your Codex/OpenAI login (run &lt;code&gt;codex login&lt;/code&gt;), Gemini signs in once through its new Antigravity CLI (run &lt;code&gt;agy&lt;/code&gt;), Grok reads &lt;code&gt;XAI_API_KEY&lt;/code&gt;, OpenRouter reads &lt;code&gt;OPENROUTER_API_KEY&lt;/code&gt;. Start with GPT and Claude, since you likely already pay for both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually changed
&lt;/h2&gt;

&lt;p&gt;Three things.&lt;/p&gt;

&lt;p&gt;It is one MCP server now, not a Claude-only plugin. It works in Claude Code, Cursor, Codex, Kiro, OpenCode - anything that speaks MCP. Your primary agent stays in charge and calls the panel when a decision is worth a second look.&lt;/p&gt;

&lt;p&gt;The panel opened up. It used to be three fixed externals: GPT, Gemini, Grok. Those are still the built-in voices, but you can now add any of OpenRouter's 400-plus models (including Qwen, DeepSeek, Kimi, and others) as named records in a config file. You pick them. You say which ones join a quick fan-out, which ones sit on the consensus panel, and which expert hat each one is allowed to wear - not all are equally good for all tasks.&lt;/p&gt;

&lt;p&gt;And the whole thing learned to &lt;strong&gt;measure itself&lt;/strong&gt;. That last part is the real subject of this post.&lt;/p&gt;

&lt;p&gt;I will walk through it as five categories. Each one has a switch you flip in config, and each one votes - or refuses to - in a specific way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick one-offs: just ask
&lt;/h2&gt;

&lt;p&gt;Before the heavy machinery, the everyday move. Most of the time you do not want a five-round debate - you want a fast second opinion from someone who is not the model you have been talking to all session.&lt;/p&gt;

&lt;p&gt;So you just ask, in plain words: "Ask Grok and Gemini whether this retry loop can deadlock." Your agent routes that to the right voices and brings back the answers. Or use the explicit commands when you know who you want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ask-gpt does this IAM policy grant more than it should?
/ask-grok poke holes in this rollback plan
/ask-gemini summarize the risk in this migration in 5 bullets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are single-shot and advisory: one question, one answer, no loop, no context contamination from your session. The exact same commands work in Claude Code, Codex, Cursor, Kiro, and OpenCode - the server is the same everywhere, so a prompt you learn in one host transfers to the rest unchanged.&lt;/p&gt;

&lt;p&gt;And none of this is a separate ritual you have to schedule. Call &lt;code&gt;/ask-all&lt;/code&gt; or &lt;code&gt;/consensus&lt;/code&gt; at any point in the work - while you are still scoping a feature, halfway through writing a plan, or in the middle of a &lt;a href="https://github.com/mattpocock/skills/blob/main/skills/productivity/grill-me/SKILL.md" rel="noopener noreferrer"&gt;&lt;code&gt;/grill-me&lt;/code&gt;&lt;/a&gt; session when you want a real outside voice in the room instead of arguing with only yourself. The panel is available the whole time, not just at the review at the end. The earlier you pull in a dissenting model, the cheaper the disagreement is to act on.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Fan-out with no cross-talk
&lt;/h2&gt;

&lt;p&gt;The next cheapest move: ask several models the same question at once and read the answers side by side. Nobody sees anyone else's reply, so you get real independence instead of an echo.&lt;/p&gt;

&lt;p&gt;Two examples where this works the best:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ask-all can this Terraform state migration be rolled back safely, and what breaks if it cannot?
/ask-all what are the failure modes of switching this table's primary key on a live database?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;/ask-all&lt;/code&gt; runs the parallel version. &lt;code&gt;panel&lt;/code&gt; tells you exactly who &lt;em&gt;would&lt;/em&gt; answer for your current config before you spend a token. &lt;code&gt;ask-one&lt;/code&gt; lets you fire each member yourself, so each answer lands on screen as it finishes instead of waiting on the slowest one.&lt;/p&gt;

&lt;p&gt;Switches: &lt;code&gt;routing.maxFanout&lt;/code&gt; caps how many OpenRouter models join (default 3), and each model record has an &lt;code&gt;askAll&lt;/code&gt; flag to opt in or out.&lt;/p&gt;

&lt;p&gt;There is no voting here. It is parallel sampling. You and your agent (Claude Code, Codex, or anything else) get all the answers and decide.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The consensus loop, where the voting lives
&lt;/h2&gt;

&lt;p&gt;This is the heart of it, and it is a state machine, not a vibe. One round goes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blind pre-commit.&lt;/strong&gt; The arbiter writes down its own verdict - approve, request changes, or reject - before any other model sees the work. In writing, first. So its judgment cannot quietly drift to match the crowd later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel peer review.&lt;/strong&gt; The panel reviews the same artifact, each in a fresh thread, none seeing another's review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind cross-review.&lt;/strong&gt; Each model then rates the others' answers with names stripped off (to avoid bias and deference). A "not viable" vote becomes a candidate problem the arbiter has to deal with. This catches the case where everyone looked like they agreed but were each walking past the same hole. (Pattern borrowed from &lt;a href="https://github.com/karpathy/llm-council" rel="noopener noreferrer"&gt;karpathy/llm-council&lt;/a&gt;.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adjudication.&lt;/strong&gt; The arbiter goes through every objection and accepts it, dismisses it with a written reason, or defers it. Then it revises the artifact and the round runs again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It converges only when every responding model approves and the arbiter's pre-committed verdict agrees with them. If they cannot get there inside the round cap, it returns UNRESOLVED and says so. It does not fake a number to look finished.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/antonbabenko/deliberation/blob/master/assets/consensus-flow.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanenkxya8hrvlpk3alhw.png" alt="The three-stage consensus loop" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Click for the detailed diagram with the bias guards and per-model flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answers are not free text the arbiter has to guess at. Every voice returns a structured opinion, and the engine parses it into fixed fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;recommendation&lt;/strong&gt; (the actual call),&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;confidence&lt;/strong&gt; label, so a weak "maybe" does not count the same as a firm "yes",&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dissent points&lt;/strong&gt;, &lt;strong&gt;assumptions&lt;/strong&gt;, and &lt;strong&gt;tradeoffs&lt;/strong&gt; as separate lists, so a disagreement is attached to a reason instead of buried in prose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reviews add a &lt;strong&gt;verdict&lt;/strong&gt; (approve / request changes / reject) and a list of &lt;strong&gt;critical issues&lt;/strong&gt;, each sorted into a closed six-category taxonomy - so "three reviewers, nine objections" collapses into a clean, deduplicated set the arbiter can act on. Models are asked to emit this as JSON; the parser is best-effort and never throws, so if a model returns slightly malformed JSON or plain text, the content is salvaged and tagged with how it was read (clean parse vs recovered). Structured where it can be, never brittle.&lt;/p&gt;

&lt;p&gt;Switches: &lt;code&gt;consensus.arbiter&lt;/code&gt; (auto, the host model, a named provider, or a dedicated model record), &lt;code&gt;consensus.blindVote&lt;/code&gt; (add the blind pre-vote on the synthesis path), &lt;code&gt;consensus.maxRounds&lt;/code&gt; (default 5, capped at 50), and a per-model &lt;code&gt;consensus&lt;/code&gt; flag for who sits on the panel.&lt;/p&gt;

&lt;p&gt;So there are four distinct votes in play: the arbiter's blind pre-commit, each peer's independent verdict, the anonymized cross-review rating, and the final convergence check. They are doing different jobs on purpose.&lt;/p&gt;

&lt;p&gt;One honest note, since a reviewer pushed me on it: that convergence rule is a heuristic, not a proof. Unanimous approval means nobody in the room objected. It does not mean the answer is right. More on that below.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Synthesis, when there is no verdict to give
&lt;/h2&gt;

&lt;p&gt;Not everything is approve-or-reject. "Which of these two designs should we pick?" has no yes or no. Flip &lt;code&gt;synthesizeAlways: true&lt;/code&gt; and instead of the loop you get a single arbiter pass that reads every opinion and writes one combined answer - free text, no verdict, no rounds. Use the loop for go/no-go calls. Use synthesis for open questions. Same panel, two shapes.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Two drivers, one rulebook
&lt;/h2&gt;

&lt;p&gt;The loop logic lives in one place - a single state machine - and two things drive it.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;consensus&lt;/code&gt; runs the whole loop server-side with a model as the arbiter and hands you back the result in one call. &lt;code&gt;consensus-step&lt;/code&gt; lets your own agent be the arbiter and drive the loop one step at a time, so every move shows up in your transcript where you can see it.&lt;/p&gt;

&lt;p&gt;Two entry points is more surface area than one, and they do not behave identically - one is visible step by step, one is a single call. The win is that the &lt;em&gt;rules&lt;/em&gt; (how rounds count, when it converges) are written once and shared, so the two paths cannot drift apart on the part that matters. That was worth the extra work.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Make the panel stay accountable
&lt;/h2&gt;

&lt;p&gt;In a parallel fan-out, your wall-clock time is the slowest model, not the average. One slow member that rarely says anything new sets the clock for everyone. You will not notice by feel. You need numbers.&lt;/p&gt;

&lt;p&gt;So there is an opt-in debug log. Turn on &lt;code&gt;debug.enabled&lt;/code&gt; and it writes one line per model call and per round: latency (p50, p95, max), token counts, error rate, the reasoning effort used. It never records your prompts, the responses, or the issue text - only the timing and the outcome of each vote.&lt;/p&gt;

&lt;p&gt;Turn on &lt;code&gt;sessions.persist&lt;/code&gt; and runs are saved too, so you can ask how often a given model's verdict matched the final call - and, just as useful, how often it was the lone voice that caught something the others missed, or the lone voice that was simply wrong. A model that always agrees adds cost; a model that disagrees and is usually right is the one you keep.&lt;/p&gt;

&lt;p&gt;Then an &lt;code&gt;analyze&lt;/code&gt; tool reads both back and tells you, in plain terms, who is slow, who is redundant, and which config line to change. From my own panel last week: one model sat at a 200-second p95 while another finished in 15 (Grok 4.3 is often the fastest). &lt;code&gt;analyze&lt;/code&gt; flagged it and suggested the one-line switch to drop it from the default fan-out. I had been waiting on that model for almost a week without realizing it.&lt;/p&gt;

&lt;p&gt;There is also a small dedup cache, so an identical advisory question inside one session returns instantly instead of paying twice. Well, in tool-heavy work the prompts vary, so it hits less than you would hope.&lt;/p&gt;

&lt;p&gt;This is the whole shift from the first post. Consensus was the idea. Making the panel prove it earns its seat is the engineering!&lt;/p&gt;

&lt;h2&gt;
  
  
  The parts I will not oversell
&lt;/h2&gt;

&lt;p&gt;Multi-model review has real failure modes, and a post that hides them deserves the distrust it gets. So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agreement is not truth.&lt;/strong&gt; Three models can be confidently, unanimously wrong - especially when they were trained on overlapping data and share the same blind spot. Consensus lowers the odds of a &lt;em&gt;lone&lt;/em&gt; model's odd mistake. It does not discover facts. Treat it as risk reduction, not a truth oracle.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick models that actually differ.&lt;/strong&gt; A panel of three close cousins is one model agreeing with itself in three accents. The value comes from genuinely different models arguing, which is exactly why the config lets you choose them by hand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UNRESOLVED is a feature.&lt;/strong&gt; When the room cannot agree, the honest output is "we could not." That is a signal to slow down and look, not a bug to smooth out. If you wire this into CI, decide on purpose what a deadlock means there - it should probably stop the line, not wave it through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It is slow, and that is fine for the right job.&lt;/strong&gt; A full consensus loop can take minutes. Run it on plans, designs, and reviews in the background. For a fast second opinion, use single-shot fan-out or synthesis. Do not put a five-round loop in an interactive path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mind where the text goes.&lt;/strong&gt; Fanning a compliance artifact out to 400 third-party models is a data-boundary decision, not a free upgrade. The config defaults the long tail to off for exactly this reason. Turn models on deliberately.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is specific to Terraform, AWS, or even to code. The loop runs on anything you can put in text - a plan, a runbook, a decision memo. That generality is the point. It just happens to have been built running a compliance product, which is a good place to learn that "looks done" and "is done" are different claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The first lesson was: do not trust one confident model, make a few argue. The second one is harder and more useful: adding voices is easy, and most of them will not earn their seat. Measure the panel like you would measure any other service or people, and cut what does not pay.&lt;/p&gt;

&lt;p&gt;Deliberation is open source, it works in the agent you already use, and most of the models are ones you already pay for. It ships in the &lt;a href="https://github.com/antonbabenko/agent-plugins" rel="noopener noreferrer"&gt;agent-plugins&lt;/a&gt; marketplace and as a standalone MCP server on npm (&lt;code&gt;@antonbabenko/deliberation-mcp&lt;/code&gt;). Source, and a star button I would genuinely appreciate, are at &lt;a href="https://github.com/antonbabenko/deliberation" rel="noopener noreferrer"&gt;github.com/antonbabenko/deliberation&lt;/a&gt;. Install it, point GPT and Claude at each other, and let me know where it breaks. The cheapest review still happens before you execute - just make sure every reviewer in the room is worth the wait and the cost.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>council</category>
      <category>deliberation</category>
    </item>
    <item>
      <title>AWS Lambda Managed Instances with Java 25 and AWS SAM – Part 7 Implement scheduled scaling</title>
      <dc:creator>Vadym Kazulkin</dc:creator>
      <pubDate>Mon, 01 Jun 2026 15:36:09 +0000</pubDate>
      <link>https://dev.to/aws-heroes/aws-lambda-managed-instances-with-java-25-and-aws-sam-part-7-implement-scheduled-scaling-4df9</link>
      <guid>https://dev.to/aws-heroes/aws-lambda-managed-instances-with-java-25-and-aws-sam-part-7-implement-scheduled-scaling-4df9</guid>
      <description>&lt;h2&gt;
  
  
  Implement scheduled scaling
&lt;/h2&gt;

&lt;p&gt;Recently, &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/05/amazon-eventbridge-sdk-integrations/" rel="noopener noreferrer"&gt;Amazon EventBridge Scheduler added 619 new SDK API actions&lt;/a&gt;. &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-scaling.html#lambda-managed-instances-scheduled-scaling" rel="noopener noreferrer"&gt;One of these actions&lt;/a&gt; adjusts the Lambda function's minimum and maximum execution environments for Lambda Managed Instances (LMI) on a recurring or one-time schedule. This is useful for predictable traffic patterns, such as scaling up before peak hours and scaling down during off-peak hours. We can use CloudFormation or the latest versions of AWS CDK or AWS CLI to perform this action. In our example, we'll use AWS CLI to adjust the Lambda Function Scaling Configuration, for which we'll use the &lt;em&gt;PutFunctionScalingConfig API&lt;/em&gt; as a universal target. We'll use the Lambda function with the name &lt;em&gt;GetProductByIdJava25WithLMI&lt;/em&gt;, which we introduced in &lt;a href="https://dev.to/aws-heroes/aws-lambda-managed-instances-with-java-25-and-aws-sam-part-1-introduction-and-sample-application-1eb7"&gt;part 1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;First of all, let's create an SQS dead-letter queue, which we'll use for our action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws sqs create-queue &lt;span class="nt"&gt;--queue-name&lt;/span&gt; scheduler-dlq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, let's create an EventBridge Scheduler IAM execution role with the name &lt;em&gt;scale-lambda-managed-instances-eventbridge-scheduler-role&lt;/em&gt;. This role grants permission to call the &lt;em&gt;lambda:PutFunctionScalingConfig&lt;/em&gt; and send the message to the dead-letter queue on our target function.&lt;/p&gt;

&lt;p&gt;This is how the trusted policy looks for the IAM role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="nl"&gt;"Service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scheduler.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is how the IAM policy looks (use the ARNs of the Lambda function and SQS dead-letter queue here):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"lambda:PutFunctionScalingConfig"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:lambda:{aws_region}:{aws_account_id}:function:GetProductByIdJava25WithLMI:$LATEST.PUBLISHED"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"sqs:SendMessage"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:sqs:{aws_region}:{aws_account_id}:scheduler-dlq"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Please replace the values of &lt;em&gt;{aws_region}&lt;/em&gt; and &lt;em&gt;{aws_account_id}&lt;/em&gt; with your own values, and adjust the Lambda function and SQS Queue names if needed. &lt;/p&gt;

&lt;p&gt;Now, let's create the scheduler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws scheduler create-schedule &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="s2"&gt;"ScaleLambdaManagedInstances"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--schedule-expression&lt;/span&gt; &lt;span class="s2"&gt;"at(2026-05-18T08:10:00)"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--flexible-time-window&lt;/span&gt; &lt;span class="s1"&gt;'{"Mode": "OFF"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; &lt;span class="s1"&gt;'{
     {"DeadLetterConfig": {"Arn": "arn:aws:sqs:{aws_region}:{aws_account_id}:scheduler-dlq"},
    "Arn": "arn:aws:scheduler:::aws-sdk:lambda:PutFunctionScalingConfig",
    "RoleArn": "arn:aws:iam::{aws_account_id}:role/scale-lambda-managed-instances-eventbridge-scheduler-role",
    "Input": "{\"FunctionName\": \"GetProductByIdJava25WithLMI\", \"Qualifier\": \"$LATEST.PUBLISHED\", \"FunctionScalingConfig\": {\"MinExecutionEnvironments\": 5, \"MaxExecutionEnvironments\": 10}}"
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's explain what happens here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, we create the schedule with the name &lt;em&gt;ScaleLambdaManagedInstances&lt;/em&gt; using a one-time schedule (executes at 08:10 on May 18, 2026). You can use other &lt;a href="https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html" rel="noopener noreferrer"&gt;Schedule types in EventBridge Scheduler&lt;/a&gt;, like cron-based expressions.&lt;/li&gt;
&lt;li&gt;Then, we target the PutFunctionScalingConfig Scheduler API as a universal target. &lt;/li&gt;
&lt;li&gt;Next, we specify the SQS dead-letter queue ARN created above&lt;/li&gt;
&lt;li&gt;Then, we specify the IAM execution Role ARN created above&lt;/li&gt;
&lt;li&gt;Next, we specify the new MinExecutionEnvironments and MaxExecutionEnvironments values in the Input payload&lt;/li&gt;
&lt;li&gt;Finally, as the scheduler input, we specify the name of the Lambda function and its qualifier (usually &lt;em&gt;$LATEST.PUBLISHED&lt;/em&gt;), for which we'd like to change the Function Scaling Configuration. In our case, we set &lt;em&gt;MinExecutionEnvironments&lt;/em&gt; to 5 and &lt;em&gt;MaxExecutionEnvironments&lt;/em&gt; to 10.&lt;/li&gt;
&lt;li&gt;We can also optionally set the retry policy and encryption.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After creating the schedule, we'll see something similar in the &lt;a href="https://us-east-1.console.aws.amazon.com/scheduler/home?region=us-east-1#schedules" rel="noopener noreferrer"&gt;Amazon EventBridge Scheduler Service&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6f4txd2cnukcowm5uy1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6f4txd2cnukcowm5uy1.png" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ujj0wnaw07l7b9m0p8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43ujj0wnaw07l7b9m0p8.png" alt=" " width="772" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7m3ca0w0gx6g5838yx2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7m3ca0w0gx6g5838yx2m.png" alt=" " width="800" height="164"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the scheduler has run, we can verify that the Lambda function Scaling Configuration has changed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1akscybp4lcegyf73a46.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1akscybp4lcegyf73a46.png" alt=" " width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's also very important to configure the dead-letter queue. First, I didn't do it, and configured the Lambda function Resource ARN in the IAM policy like &lt;em&gt;arn:aws:lambda:{aws_region}:{aws_account_id}:function:GetProductByIdJava25WithLMI&lt;/em&gt;. I observed that the scheduler hasn't been invoked and saw the errors in Amazon CloudWatch.&lt;br&gt;
By configuring the dead-letter queue, I saw the exact error message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR_CODE: AccessDeniedException
ERROR_MESSAGE :

User: arn:aws:sts::{aws_account_id}:assumed-role/scale-lambda-managed-instances-eventbridge-scheduler-role/f375e2c757da339a8d593587ce800265 
is not authorized to perform: lambda:PutFunctionScalingConfig on resource: 
arn:aws:lambda:{aws_region}:{aws_account_id}:function:GetProductByIdJava25WithLMI:$LATEST.PUBLISHED 
because no identity-based policy allows the lambda:PutFunctionScalingConfig action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that, it was clear to me that I needed to append &lt;em&gt;$LATEST.PUBLISHED&lt;/em&gt; to the ARN of the Lambda function. Please also use &lt;a href="https://docs.aws.amazon.com/scheduler/latest/UserGuide/troubleshooting.html" rel="noopener noreferrer"&gt;Troubleshooting Amazon EventBridge Scheduler&lt;/a&gt;, in case you experience some issues.&lt;/p&gt;

&lt;p&gt;Here, we scaled up the capacity at the given time, the same way we can scale it down. Things to pay attention to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For workloads with predictable peaks, create multiple schedules to match your traffic pattern: one to scale up your function before peak hours, and another to scale down after peak hours. Each schedule follows the same pattern with updated MinExecutionEnvironments and MaxExecutionEnvironments values.&lt;/li&gt;
&lt;li&gt;Scheduled scaling adjusts the provisioned floor and ceiling of execution environments, but actual scaling between min and max still responds to CPU utilization and concurrency saturation.&lt;/li&gt;
&lt;li&gt;If your traffic more than doubles within 5 minutes of a scheduled scale-up, you might still experience throttling as capacity is provisioned.&lt;/li&gt;
&lt;li&gt;When scaling to zero to deactivate a function, remember that reactivation requires an explicit PutFunctionScalingConfig call with non-zero values.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>lambda</category>
      <category>scheduledscaling</category>
    </item>
    <item>
      <title>Lambda Managed Instances with Terraform: Multi-Concurrency, High Memory, and Compute Options</title>
      <dc:creator>Darryl Ruggles</dc:creator>
      <pubDate>Fri, 29 May 2026 23:45:10 +0000</pubDate>
      <link>https://dev.to/aws-heroes/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options-3a5g</link>
      <guid>https://dev.to/aws-heroes/lambda-managed-instances-with-terraform-multi-concurrency-high-memory-and-compute-options-3a5g</guid>
      <description>&lt;p&gt;Lambda has always been one request at a time per execution environment. Your function starts, processes a single invocation, and sits idle until the next one arrives. If you need to handle a thousand concurrent requests, Lambda spins up a thousand execution environments - each with its own memory, its own cold start, and its own per-GB-second bill.&lt;/p&gt;

&lt;p&gt;Lambda Managed Instances changes that model. Announced at re:Invent 2025 and expanded with &lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-32-gb-memory-16-vcpus/" rel="noopener noreferrer"&gt;32 GB memory / 16 vCPU support&lt;/a&gt; in March 2026, LMI runs your functions on EC2 instances in your VPC with AWS handling provisioning, patching, scaling, and load balancing. Each execution environment handles multiple concurrent requests. You keep the Lambda programming model and gain EC2 hardware selection and pricing.&lt;/p&gt;

&lt;p&gt;I built a product similarity engine to explore how this works in practice. The handler loads a product catalog with Nova embeddings via Bedrock into memory, uses Amazon Nova Multimodal Embeddings to embed incoming search queries, and computes cosine similarity across categories in parallel using ThreadPoolExecutor. It's the kind of workload that doesn't fit well on standard Lambda - sustained throughput, memory-intensive, with a mix of I/O (Bedrock API calls) and CPU (vector math) that benefits from multi-concurrency and configurable memory-to-vCPU ratios. The project uses Terraform for infrastructure, Python 3.14 with Powertools for observability, and the embedding model is configurable (Nova by default, Titan Text Embeddings V2 as an alternative).&lt;/p&gt;

&lt;p&gt;The source code is on GitHub: &lt;a href="https://github.com/RDarrylR/lambda-managed-instances-similarity-engine" rel="noopener noreferrer"&gt;lambda-managed-instances-similarity-engine&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The AWS Compute Continuum
&lt;/h2&gt;

&lt;p&gt;Before diving into the implementation, it helps to understand where Lambda Managed Instances fits in the AWS compute landscape. The options form a continuum from fully managed to fully self-managed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdp6gzc3m6orf7pdjyni3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdp6gzc3m6orf7pdjyni3.png" alt="AWS Compute Continuum" width="799" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Standard Lambda&lt;/th&gt;
&lt;th&gt;Lambda Managed Instances&lt;/th&gt;
&lt;th&gt;ECS Express Mode&lt;/th&gt;
&lt;th&gt;ECS Fargate&lt;/th&gt;
&lt;th&gt;EKS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-invocation, instant&lt;/td&gt;
&lt;td&gt;Async, CPU-based and concurrency saturation&lt;/td&gt;
&lt;td&gt;Traffic-based, auto&lt;/td&gt;
&lt;td&gt;Task-based, minutes&lt;/td&gt;
&lt;td&gt;Pod-based, minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 per environment&lt;/td&gt;
&lt;td&gt;Multiple per environment&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-request + GB-second&lt;/td&gt;
&lt;td&gt;Per-request + EC2 + 15% mgmt fee&lt;/td&gt;
&lt;td&gt;Fargate + ALB&lt;/td&gt;
&lt;td&gt;Per-vCPU-hour&lt;/td&gt;
&lt;td&gt;EC2/Fargate + control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Commitment discounts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Savings Plans, Reserved Instances&lt;/td&gt;
&lt;td&gt;Fargate Savings Plans&lt;/td&gt;
&lt;td&gt;Fargate Savings Plans&lt;/td&gt;
&lt;td&gt;EC2 Savings, RIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cold start&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milliseconds-seconds&lt;/td&gt;
&lt;td&gt;Tens of seconds (new instances)&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max invocation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;td&gt;15 minutes (environments long-lived, instances rotated by Lambda)&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;td&gt;No limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VPC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to 10 GB&lt;/td&gt;
&lt;td&gt;Up to 32 GB (configurable vCPU ratio)&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;td&gt;Configurable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ops burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to choose Lambda Managed Instances:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sustained, predictable throughput (hundreds or thousands of requests per second)&lt;/li&gt;
&lt;li&gt;Workloads that benefit from specific EC2 instance types (Graviton4, high-bandwidth networking)&lt;/li&gt;
&lt;li&gt;Memory-intensive functions that exceed standard Lambda's 10 GB limit or need configurable memory-to-vCPU ratios&lt;/li&gt;
&lt;li&gt;Cost optimization at scale (10M+ invocations/month where EC2 pricing with Savings Plans beats per-GB-second)&lt;/li&gt;
&lt;li&gt;Functions that load large datasets into memory and reuse them across requests (embeddings, models, reference data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When standard Lambda is still better:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bursty, unpredictable traffic patterns&lt;/li&gt;
&lt;li&gt;Low to moderate throughput (standard Lambda's per-invocation pricing wins)&lt;/li&gt;
&lt;li&gt;Functions that need instant scaling (LMI scales asynchronously based on CPU utilization and execution-environment saturation; if traffic more than doubles within 5 minutes you may see throttles while capacity catches up)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've written about several of these compute options in previous projects. My &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS deep dive&lt;/a&gt; covers Fargate and ECS Express Mode. The &lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; demonstrates Step Functions with both Lambda and Fargate. My &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools best practices&lt;/a&gt; article covers the observability patterns used in this project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F234nfj3l6b9wlyoo76oa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F234nfj3l6b9wlyoo76oa.png" alt="Lambda Managed Instances Architecture" width="800" height="875"&gt;&lt;/a&gt;&lt;br&gt;
The architecture has three layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity Provider&lt;/strong&gt; - The foundation. Defines the VPC configuration, instance requirements (architecture, instance types), and scaling policies. &lt;strong&gt;Capacity providers define both the security boundary and the failure blast radius of your workload.&lt;/strong&gt; All functions assigned to the same capacity provider share EC2 instances and must be mutually trusted. This uses container-based isolation, not Firecracker. A compromised function on a shared capacity provider can affect every other function on the same instances. Separate untrusted workloads, regulated workloads, and production from non-production into distinct capacity providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Instances&lt;/strong&gt; - EC2 instances launched and managed by Lambda in your VPC. They're visible in the EC2 console (tagged as managed by Lambda) but you don't SSH into them, patch them, or configure autoscaling groups - Lambda handles all of that. The lifecycle includes a 14-day rotation for security compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution Environments&lt;/strong&gt; - Containers running your function code on the managed instances. Each environment handles multiple concurrent requests. For Python, each concurrency slot is a separate process with its own memory space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Networking&lt;/strong&gt; - VPC connectivity is mandatory. Without proper outbound connectivity, functions execute but logs and traces are silently lost. This project uses private subnets with a NAT Gateway for telemetry transmission and Bedrock API access. For production, consider VPC endpoints to keep traffic on the AWS network.&lt;/p&gt;


&lt;h2&gt;
  
  
  Two-Level Concurrency
&lt;/h2&gt;

&lt;p&gt;This is what makes Lambda Managed Instances architecturally different from standard Lambda. There are two levels of parallelism:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 - LMI manages for you:&lt;/strong&gt; Multiple processes handle concurrent requests. Python's LMI runtime spawns a separate process for each concurrency slot (default: 16 per vCPU). Each process has its own memory space, its own global variables, and its own boto3 clients. No shared mutable state between processes. Scaling decisions are based on both execution environment saturation and CPU utilization, not request count alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2 - You manage yourself:&lt;/strong&gt; Within each request, you can use &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; to parallelize I/O operations. If your handler needs to search 5 product categories, you can search them in parallel rather than sequentially.&lt;/p&gt;

&lt;p&gt;Combined, this means a single execution environment with 1 vCPU and 10 concurrent processes, each running 4 search threads, can have 40 category searches in flight concurrently. On standard Lambda, you'd need 10 separate execution environments to handle those 10 concurrent requests, each paying per-GB-second for its own copy of the catalog in memory.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuegxeocl5i8a6h5lk453.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuegxeocl5i8a6h5lk453.png" alt="Two-Level Concurrency" width="800" height="774"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each process receives a request, calls Bedrock to embed the query text, then fans out across categories using ThreadPoolExecutor. The catalog data (loaded from DynamoDB at process init) stays in memory across all requests handled by that process.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why LMI Instead of Standard Lambda
&lt;/h3&gt;

&lt;p&gt;This workload is a poor fit for standard Lambda and a strong fit for LMI. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-memory catalog at scale.&lt;/strong&gt; Each process loads the product catalog with embedding vectors into memory at initialization. A 100K product catalog with 384-dimensional vectors is roughly 150 MB per process. With 10 concurrent processes, that's 1.5 GB for catalog data alone. Standard Lambda's maximum is 10 GB total, and you pay per-GB-second for every millisecond of that memory. LMI gives you up to 32 GB with configurable memory-to-vCPU ratios, and you pay EC2 instance pricing regardless of how much memory your function uses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-concurrency amortizes catalog loading.&lt;/strong&gt; On standard Lambda, 10 concurrent requests means 10 independent execution environments, each cold-starting and loading the catalog into its own memory, each paying per-GB-second. On LMI, those 10 requests run as 10 processes on one EC2 instance. The catalog loads once per process at init time and stays warm for all subsequent requests routed to that process. At sustained throughput, this eliminates the repeated cold-start penalty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sustained throughput economics.&lt;/strong&gt; A product recommendation API serving a storefront has predictable, sustained traffic - hundreds of requests per second during business hours. Each request involves a Bedrock API call for query embedding (I/O), cosine similarity across categories (CPU), and structured logging (I/O). At 10M+ invocations per month, EC2 pricing with Savings Plans is 60-72% cheaper than standard Lambda's per-GB-second model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configurable memory-to-vCPU ratio.&lt;/strong&gt; This workload is memory-heavy (large catalog) with moderate CPU needs (vector math on 384 dimensions). The 4:1 memory-to-vCPU ratio gives 4 GB of memory per vCPU - enough for the catalog plus Bedrock client overhead. Standard Lambda locks you into a fixed ratio where more memory always means proportionally more CPU and higher cost.&lt;/p&gt;


&lt;h3&gt;
  
  
  Why Not Fargate?
&lt;/h3&gt;

&lt;p&gt;This project could run on ECS Fargate. The handler logic would move into a FastAPI app, the catalog would load at container startup, and an ALB would handle routing. It would work fine. But the infrastructure footprint would be significantly larger:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Lambda Managed Instances&lt;/th&gt;
&lt;th&gt;ECS Fargate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Application code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single handler function&lt;/td&gt;
&lt;td&gt;Web framework + Dockerfile + health checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Capacity provider + function&lt;/td&gt;
&lt;td&gt;Cluster + task def + service + ALB + target group + listener rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auto-scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Built into capacity provider&lt;/td&gt;
&lt;td&gt;Application Auto Scaling policies (target tracking, step scaling)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Event triggers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native (SQS, EventBridge, API Gateway, S3)&lt;/td&gt;
&lt;td&gt;Requires separate wiring per trigger&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Terraform lines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~200 across 4 modules&lt;/td&gt;
&lt;td&gt;~400-500 with ALB, ECR, auto-scaling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Container image&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not needed (zip deployment)&lt;/td&gt;
&lt;td&gt;Required (Dockerfile, ECR push, image lifecycle)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For teams already comfortable with Lambda, LMI is the path of least resistance to get EC2 pricing and multi-concurrency without learning container orchestration. You keep the programming model you know and gain the hardware flexibility you need. The reverse is also true: &lt;strong&gt;for teams already invested in ECS, Fargate may remain the more operationally familiar choice&lt;/strong&gt; - the muscle memory, dashboards, deployment pipelines, and on-call runbooks are already in place.&lt;/p&gt;

&lt;p&gt;Where Fargate or EKS would be the better choice: custom native dependencies that exceed Lambda layer limits (PyTorch, large ML models), persistent connections (WebSocket, gRPC), specialized instance types not supported by LMI, or workloads that need the Kubernetes ecosystem. I covered Fargate patterns in my &lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;ECS deep dive&lt;/a&gt; and &lt;a href="https://darryl-ruggles.cloud/dsql-kabob-store/" rel="noopener noreferrer"&gt;Kabob Store&lt;/a&gt; projects. My &lt;a href="https://darryl-ruggles.cloud/a-complete-terraform-setup-for-eks-auto-mode-is-it-right-for-you/" rel="noopener noreferrer"&gt;EKS Auto Mode&lt;/a&gt; article covers Karpenter-based scaling.&lt;/p&gt;

&lt;p&gt;One specific area where EKS with Karpenter is significantly more sophisticated: &lt;strong&gt;scaling down and cost optimization at idle.&lt;/strong&gt; LMI's scale-down is conservative - in my testing, 2 EC2 instances remained running overnight with zero traffic (1 per AZ). There's no minimum instance setting, no consolidation, and no way to force scale-to-zero short of deleting the function version or capacity provider. Karpenter, by contrast, actively consolidates workloads onto fewer nodes, replaces larger instances with smaller ones when demand drops, and can use Spot instances for fault-tolerant workloads. If your traffic has significant idle periods (nights, weekends), this difference matters for cost. LMI's simplicity comes at the price of less intelligent scaling.&lt;/p&gt;


&lt;h2&gt;
  
  
  Setting It Up with Terraform
&lt;/h2&gt;

&lt;p&gt;The complete infrastructure is organized into four Terraform modules: networking, IAM, capacity provider, and Lambda. Every IAM policy follows least privilege, and the configuration follows the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/welcome.html" rel="noopener noreferrer"&gt;AWS Well-Architected Framework&lt;/a&gt; Security and Cost Optimization pillars. All resources use official HashiCorp providers (&lt;code&gt;hashicorp/aws&lt;/code&gt; and &lt;code&gt;hashicorp/archive&lt;/code&gt; where applicable) - no community modules or third-party providers.&lt;/p&gt;

&lt;p&gt;For a fully production-hardened deployment, you'd also want to address the Reliability, Performance Efficiency, and Operational Excellence pillars more explicitly. The &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/welcome.html" rel="noopener noreferrer"&gt;AWS Serverless Applications Lens&lt;/a&gt; emphasizes thinking in concurrent requests, sharing nothing, designing for failures and duplicates, and using versions and aliases for safe reversible deployments. Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ deployment&lt;/strong&gt; - subnets in at least two AZs (this demo does)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption at rest with customer-managed KMS keys&lt;/strong&gt; - on the capacity provider (&lt;code&gt;kms_key_arn&lt;/code&gt; on &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;), DynamoDB (&lt;code&gt;server_side_encryption&lt;/code&gt; with &lt;code&gt;kms_key_arn&lt;/code&gt;), and CloudWatch Logs (&lt;code&gt;kms_key_id&lt;/code&gt; on the log group)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoints instead of NAT Gateway&lt;/strong&gt; (covered later in this article)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invoke through an alias, not the published version directly&lt;/strong&gt; - The demo invokes the qualified ARN of the published function version (&lt;code&gt;function:name:1&lt;/code&gt;). For production, create an alias (&lt;code&gt;prod&lt;/code&gt;, &lt;code&gt;live&lt;/code&gt;, &lt;code&gt;stable&lt;/code&gt;) pointing to a specific version and have callers invoke the alias ARN. Aliases enable instant rollback by updating one pointer, support traffic-shifting deployments (10% to a new version, then 50%, then 100%), and decouple caller code from version numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency for downstream side effects&lt;/strong&gt; - &lt;strong&gt;Because Lambda may retry or duplicate events, handlers must remain idempotent - even when using long-lived in-memory state.&lt;/strong&gt; The Powertools idempotency utility uses DynamoDB to deduplicate requests by a configurable key. For this similarity engine the Bedrock embedding call is a read operation and the only state change is logging, so idempotency is less critical. For handlers that write to DynamoDB, send notifications, or charge a payment, idempotency is essential because LMI's at-least-once delivery semantics mean retries can produce duplicate side effects. The in-memory catalog is read-only and shared across requests, but any per-request state that produces side effects needs deduplication.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch alarms on LMI-specific metrics&lt;/strong&gt; (covered in the Observability section)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The demo includes the basics. The production hardening above is straightforward incremental work.&lt;/p&gt;
&lt;h3&gt;
  
  
  Provider Configuration
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 1.11.0"&lt;/span&gt;

  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Validated with AWS provider v6.x (tested with 6.31+)&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/aws"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.31"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;archive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/archive"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 2.7"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"aws"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt;
  &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_profile&lt;/span&gt;  &lt;span class="c1"&gt;# Set via AWS_PROFILE env var or -var flag&lt;/span&gt;

  &lt;span class="nx"&gt;default_tags&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;Project&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda-managed-instances"&lt;/span&gt;
      &lt;span class="nx"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="nx"&gt;ManagedBy&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"terraform"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The &lt;code&gt;~&amp;gt; 6.31&lt;/code&gt; constraint pins to the current stable major (6.31.0 at the time of writing) without locking too tightly. &lt;code&gt;memory_size&lt;/code&gt; values above 10240 MB require hashicorp/aws 6.29.0 or later - earlier releases had a schema validator that capped &lt;code&gt;memory_size&lt;/code&gt; at 10 GB even for LMI functions (fixed in #46065). Without a recent provider, attempting to set 16 GB or 32 GB on an LMI function fails at &lt;code&gt;terraform plan&lt;/code&gt; with a confusing validation error.&lt;/p&gt;
&lt;h3&gt;
  
  
  IAM: The Two-Role Model
&lt;/h3&gt;

&lt;p&gt;Lambda Managed Instances requires two separate IAM roles - a deliberate separation of concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Operator Role&lt;/strong&gt; - Allows Lambda to manage EC2 instances on your behalf. Your function code never gets these permissions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"operator"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-operator"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda.amazonaws.com"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringEquals&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"aws:SourceAccount"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;account_id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"operator"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:iam::aws:policy/service-role/AWSLambdaManagedEC2ResourceOperator"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Execution Role&lt;/strong&gt; - Scoped to only what the function needs. No EC2 permissions, no wildcard resources. Bedrock access is limited to specific embedding model families.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# DynamoDB - least privilege: handler only does Query on the category-index GSI.&lt;/span&gt;
&lt;span class="c1"&gt;# The seed script runs locally with the operator's credentials and uses its own&lt;/span&gt;
&lt;span class="c1"&gt;# IAM identity for PutItem - not this role.&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"execution_dynamodb"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"dynamodb-access"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"dynamodb:Query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dynamodb_table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s2"&gt;"${var.dynamodb_table}/index/*"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Bedrock - scoped to the specific configured embedding model&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"execution_bedrock"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bedrock-embeddings"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s2"&gt;"arn:aws:bedrock:${var.aws_region}::foundation-model/${var.embedding_model_id}"&lt;/span&gt;
      &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime function only needs &lt;code&gt;dynamodb:Query&lt;/code&gt; because &lt;code&gt;_load_catalog()&lt;/code&gt; queries the &lt;code&gt;category-index&lt;/code&gt; GSI rather than scanning the table. No PutItem, no GetItem, no Scan. The seed script (&lt;code&gt;scripts/seed_catalog.py&lt;/code&gt;) runs locally on the developer's machine with their own IAM identity - it never assumes the function execution role, so the runtime role doesn't need write permissions. The Bedrock policy is scoped to the exact model ARN configured via &lt;code&gt;var.embedding_model_id&lt;/code&gt;, not a wildcard. This is what "least privilege" looks like when you actually walk through the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Capacity Provider
&lt;/h3&gt;

&lt;p&gt;The capacity provider defines the EC2 infrastructure where your functions run. The &lt;code&gt;scaling_mode = "Manual"&lt;/code&gt; with a target CPU utilization policy gives you control over scaling behavior while still letting Lambda handle the mechanics.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_capacity_provider"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subnet_ids&lt;/span&gt;
    &lt;span class="nx"&gt;security_group_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;security_group_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;permissions_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;capacity_provider_operator_role_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;operator_role_arn&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;instance_requirements&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;architectures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# "arm64" for Graviton&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;capacity_provider_scaling_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;max_vcpu_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_vcpu_count&lt;/span&gt;
    &lt;span class="nx"&gt;scaling_mode&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Manual"&lt;/span&gt;

    &lt;span class="nx"&gt;scaling_policies&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;predefined_metric_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"LambdaCapacityProviderAverageCPUUtilization"&lt;/span&gt;
      &lt;span class="nx"&gt;target_value&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target_cpu_utilization&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The capacity provider supports two scaling modes: &lt;code&gt;Auto&lt;/code&gt; and &lt;code&gt;Manual&lt;/code&gt;. Auto mode is hands-off - Lambda picks an internal target CPU utilization and scales based on AWS-chosen defaults, with no explicit &lt;code&gt;scaling_policies&lt;/code&gt; block needed. I chose Manual mode for this project because it lets me set an explicit target (50% in the demo config) so the scaling behavior is predictable and tunable. With a lower target, the capacity provider scales out faster and maintains more headroom for traffic bursts. For a production workload where you trust AWS to pick reasonable defaults, Auto mode is simpler and a valid choice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Function with Capacity Provider
&lt;/h3&gt;

&lt;p&gt;Four key differences from a standard Lambda function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;capacity_provider_config&lt;/code&gt; attaches the function to LMI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;publish = true&lt;/code&gt; is required - LMI runs on published versions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_size&lt;/code&gt; minimum is 2048 MB (2 GB / 1 vCPU)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;execution_environment_memory_gib_per_vcpu&lt;/code&gt; controls the memory-to-vCPU ratio (new in March 2026)
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;locals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;powertools_layer_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"arm64"&lt;/span&gt;
    &lt;span class="err"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:lambda:${var.aws_region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python314-arm64:${var.powertools_layer_version}"&lt;/span&gt;
    &lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:lambda:${var.aws_region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python314-x86_64:${var.powertools_layer_version}"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;powertools_env_vars&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_SERVICE_NAME&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-handler"&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_METRICS_NAMESPACE&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metrics_namespace&lt;/span&gt;
    &lt;span class="nx"&gt;POWERTOOLS_LOG_LEVEL&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_level&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"handler"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${var.project_name}-${var.environment}-handler"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;execution_role_arn&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"handler.lambda_handler"&lt;/span&gt;
  &lt;span class="nx"&gt;runtime&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"python3.14"&lt;/span&gt;
  &lt;span class="nx"&gt;architectures&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;instance_architecture&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;memory_size&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_memory_size&lt;/span&gt;
  &lt;span class="nx"&gt;timeout&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
  &lt;span class="nx"&gt;publish&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;filename&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;archive_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_path&lt;/span&gt;
  &lt;span class="nx"&gt;source_code_hash&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;archive_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_base64sha256&lt;/span&gt;

  &lt;span class="nx"&gt;layers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;powertools_layer_arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;capacity_provider_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;lambda_managed_instances_capacity_provider_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;capacity_provider_arn&lt;/span&gt;                    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;capacity_provider_arn&lt;/span&gt;
      &lt;span class="nx"&gt;execution_environment_memory_gib_per_vcpu&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;memory_gib_per_vcpu&lt;/span&gt;
      &lt;span class="nx"&gt;per_execution_environment_max_concurrency&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;max_concurrency_per_environment&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;logging_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;log_format&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"JSON"&lt;/span&gt;
    &lt;span class="nx"&gt;application_log_level&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;log_level&lt;/span&gt;
    &lt;span class="nx"&gt;system_log_level&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"WARN"&lt;/span&gt;
    &lt;span class="nx"&gt;log_group&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_cloudwatch_log_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;tracing_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Active"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;environment&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;variables&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;local&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;powertools_env_vars&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;DYNAMODB_TABLE&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_dynamodb_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;products&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
      &lt;span class="nx"&gt;ENVIRONMENT&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;environment&lt;/span&gt;
      &lt;span class="nx"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding_model_id&lt;/span&gt;
      &lt;span class="nx"&gt;EMBEDDING_DIMENSION&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;tostring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding_dimension&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;memory_gib_per_vcpu&lt;/code&gt; setting is powerful. LMI enforces a minimum of 1 vCPU per execution environment, so the ratio determines how much memory you get for that minimum. Examples at the 8 GB level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2:1 ratio&lt;/strong&gt; = 8 GB / 4 vCPUs (compute-heavy: batch processing, data crunching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4:1 ratio&lt;/strong&gt; = 8 GB / 2 vCPUs (balanced: API handlers, typical workloads)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8:1 ratio&lt;/strong&gt; = 8 GB / 1 vCPU (memory-heavy: large in-memory datasets, caching)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The product similarity engine uses 4 GB at 4:1 - 1 vCPU per environment, which is the smallest balanced configuration that fits the catalog plus 10 worker processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Note on Packaging Dependencies
&lt;/h3&gt;

&lt;p&gt;The Powertools layer is pinned to a specific version (minimum 3.23.0 - the first release that officially supports LMI). For everything else, follow AWS's guidance for Python Lambda functions: package all dependencies, including &lt;code&gt;boto3&lt;/code&gt; and &lt;code&gt;botocore&lt;/code&gt;, with the function rather than relying on the runtime's bundled copies. &lt;strong&gt;Even though boto3 is available in the runtime, package it with your function to avoid version drift.&lt;/strong&gt; The runtime's boto3 is updated on AWS's schedule, not yours, and version drift between local development and the runtime can produce subtle bugs that are hard to reproduce. For production zip deployments, &lt;code&gt;pip install --target build/ boto3 botocore&lt;/code&gt; and ship them in the zip. The demo uses the runtime's boto3 for simplicity, but production code should not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Concurrency by Language
&lt;/h2&gt;

&lt;p&gt;LMI supports five runtimes today: Python 3.13+, Node.js 22+, Java 21+, .NET 8+, and Rust on the OS-only runtime. All modern runtimes (Python 3.12+) are based on Amazon Linux 2023, replacing AL2 ahead of its June 2026 end-of-life. &lt;strong&gt;Every language handles multi-concurrency differently&lt;/strong&gt;, and the differences matter - they change how you write the handler, what concurrency bugs you have to worry about, and how memory scales.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;Concurrency Model&lt;/th&gt;
&lt;th&gt;What This Means for Your Handler&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple processes per environment&lt;/td&gt;
&lt;td&gt;Full isolation - each process has its own memory, globals, and boto3 clients. No thread-safety concerns. Memory multiplies linearly with concurrency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Node.js&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Worker threads with async dispatch&lt;/td&gt;
&lt;td&gt;Each worker thread can also handle async requests concurrently. Requires safe handling of shared state.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Java&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single process with OS threads&lt;/td&gt;
&lt;td&gt;Multiple threads execute the handler simultaneously in shared memory. Requires explicit thread-safe code: synchronized collections, no shared mutable state, atomic operations. The hardest model to get right.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;.NET&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;.NET Tasks with async processing&lt;/td&gt;
&lt;td&gt;Same patterns as ASP.NET Core - thread-safe data structures, no static mutable state.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rust&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single process, Tokio async tasks&lt;/td&gt;
&lt;td&gt;Compile-time enforcement: handlers must implement &lt;code&gt;Clone + Send&lt;/code&gt;. The compiler catches concurrency bugs that other languages catch at runtime (or in production).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Python is the &lt;strong&gt;simplest&lt;/strong&gt; model because there's no shared memory between concurrent requests. The trade-off is per-process memory multiplication. Java is the &lt;strong&gt;hardest&lt;/strong&gt; because thread safety becomes a concern on every line that touches shared state. Rust is the &lt;strong&gt;safest&lt;/strong&gt; because the compiler refuses to let you write non-thread-safe code in the first place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This blog focuses on the Python implementation.&lt;/strong&gt; The patterns shown here (process isolation, ThreadPoolExecutor for parallel I/O within a request, memory tuning around &lt;code&gt;per_execution_environment_max_concurrency&lt;/code&gt;) are specific to how Python's LMI runtime works. The architecture concepts (capacity providers, scaling, networking, IAM) apply identically across all five languages, but the handler code patterns would differ if you were writing in Java or Rust.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Python Handler
&lt;/h2&gt;

&lt;p&gt;Python's LMI runtime uses multiple &lt;strong&gt;processes&lt;/strong&gt; (not threads) for multi-concurrency. Each concurrent request runs in a separate process with its own memory space. Global variables, module-level caches, and boto3 clients are completely isolated between processes. This is simpler than the thread-based and async models above because there are no shared-memory concurrency concerns.&lt;/p&gt;

&lt;p&gt;This blog uses Python 3.14, the newest supported version. Note that Lambda's Python 3.14 ships with the JIT and free-threaded mode disabled, so the GIL is still in effect.&lt;/p&gt;

&lt;p&gt;The one shared resource: &lt;code&gt;/tmp&lt;/code&gt;. All processes in an execution environment share the same &lt;code&gt;/tmp&lt;/code&gt; directory. Use request-scoped filenames to prevent collisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handler Structure with Powertools
&lt;/h3&gt;

&lt;p&gt;Following the &lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools best practices&lt;/a&gt; pattern - Logger, Tracer, and Metrics decorators in the correct order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;concurrent.futures&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;as_completed&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tracer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MetricUnit&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.utilities.typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LambdaContext&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tracer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Tracer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Metrics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Module-level init runs ONCE PER PROCESS.
# With 10 concurrent processes, this runs 10 times.
# Each process loads its own catalog copy and boto3 clients.
&lt;/span&gt;&lt;span class="n"&gt;PROCESS_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getpid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;AWS_REGION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EMBEDDING_MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.nova-2-multimodal-embeddings-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;EMBEDDING_DIMENSION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_DIMENSION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;384&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;dynamodb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DYNAMODB_TABLE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;bedrock_runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;


&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_load_catalog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load product catalog once per process. Uses Query (least privilege).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# already loaded in this process
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="c1"&gt;# ... query DynamoDB by category and populate _catalog ...
&lt;/span&gt;

&lt;span class="nd"&gt;@logger.inject_lambda_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log_event&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@tracer.capture_lambda_handler&lt;/span&gt;
&lt;span class="nd"&gt;@metrics.log_metrics&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;capture_cold_start_metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lambda_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LambdaContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROCESS_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;_load_catalog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# no-op after first call in this process
&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract params from event body or direct invocation
&lt;/span&gt;    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;else &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Embed the query text via Bedrock (I/O-bound)
&lt;/span&gt;    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_embed_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Search categories in parallel (CPU-bound)
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;categories&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_catalog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ThreadPoolExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;submit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_search_category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;cat&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;as_completed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;metrics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_metric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SearchRequests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MetricUnit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;})}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bedrock Embedding - Configurable Model
&lt;/h3&gt;

&lt;p&gt;The query text is embedded via Amazon Bedrock before similarity search. The model is configurable via the &lt;code&gt;EMBEDDING_MODEL_ID&lt;/code&gt; environment variable - Nova Multimodal Embeddings by default, with Titan Text Embeddings V2 as an alternative:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tracer.capture_method&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_embed_query_nova&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Nova Multimodal Embeddings request format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;request_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;taskType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SINGLE_EMBEDDING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;singleEmbeddingParams&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingPurpose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TEXT_RETRIEVAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddingDimension&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EMBEDDING_DIMENSION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;truncationMode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock_runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EMBEDDING_MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Product embeddings are generated at seed time using &lt;code&gt;GENERIC_INDEX&lt;/code&gt; purpose and stored in DynamoDB alongside the product data. Query embeddings use &lt;code&gt;TEXT_RETRIEVAL&lt;/code&gt; purpose at runtime. Nova supports 4 dimension sizes (256, 384, 1024, 3072) - trading off accuracy against memory and compute cost. The demo uses 384 dimensions as a practical balance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cosine Similarity - The CPU Bottleneck
&lt;/h3&gt;

&lt;p&gt;The vector similarity computation is the compute-intensive core after the Bedrock call returns. For production, use NumPy - it's 10-50x faster than a pure Python loop and releases the GIL during C-level operations, which makes the ThreadPoolExecutor pattern actually parallel:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Production version: batch operation across all products in a category.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# query shape: (D,), catalog shape: (N, D)
&lt;/span&gt;    &lt;span class="n"&gt;norms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linalg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;norm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norms&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;norms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pure Python version is included in the demo as an educational fallback (no NumPy dependency, easier to read):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cosine_similarity_pure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Educational version: shows the math, no dependencies.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vec_a&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;norm_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;vec_b&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;norm_b&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;norm_a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;norm_b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler code, the capacity provider, the Terraform - none of it would need to change to run on an instance type with hardware-accelerated vector operations. The capacity provider's instance type selection is the only variable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Process Memory Multiplication
&lt;/h3&gt;

&lt;p&gt;This is the most important thing to understand about Python LMI. Each process loads its own copy of the catalog:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10 concurrent processes x 200 MB catalog = 2 GB just for catalog data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;MemoryUtilization&lt;/code&gt; CloudWatch metric tracks total memory consumption across all processes. If you're loading large datasets and running high concurrency, you'll hit memory limits. Tune with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce &lt;code&gt;PerExecutionEnvironmentMaxConcurrency&lt;/code&gt; (fewer processes, less memory)&lt;/li&gt;
&lt;li&gt;Increase &lt;code&gt;memory_size&lt;/code&gt; (more memory per environment)&lt;/li&gt;
&lt;li&gt;Use 8:1 &lt;code&gt;memory_gib_per_vcpu&lt;/code&gt; ratio (more memory, fewer vCPUs)&lt;/li&gt;
&lt;li&gt;Use shared &lt;code&gt;/tmp&lt;/code&gt; as a cross-process cache (load once, read from all processes)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Observability
&lt;/h2&gt;

&lt;p&gt;LMI publishes its own CloudWatch metrics in the &lt;code&gt;AWS/Lambda&lt;/code&gt; namespace at 5-minute granularity. The capacity-provider-level metrics describe overall instance utilization; the execution-environment-level metrics describe per-function resource consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capacity provider metrics&lt;/strong&gt; (dimensions: &lt;code&gt;CapacityProviderName&lt;/code&gt;, &lt;code&gt;InstanceType&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CPUUtilization&lt;/code&gt; - CPU usage across all instances in the capacity provider&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MemoryUtilization&lt;/code&gt; - Memory usage across all instances&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vCPUAllocated&lt;/code&gt; / &lt;code&gt;vCPUAvailable&lt;/code&gt; - Used vs available vCPU count&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MemoryAllocated&lt;/code&gt; / &lt;code&gt;MemoryAvailable&lt;/code&gt; - Used vs available memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Execution environment metrics&lt;/strong&gt; (dimensions: &lt;code&gt;FunctionName&lt;/code&gt;, &lt;code&gt;CapacityProviderName&lt;/code&gt;, &lt;code&gt;Resource&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; - Active concurrent requests per environment&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; - Configured maximum concurrency per environment&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentCPUUtilization&lt;/code&gt; - CPU usage of this function's environments&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ExecutionEnvironmentMemoryUtilization&lt;/code&gt; - Memory usage of this function's environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Alarms to Set First
&lt;/h3&gt;

&lt;p&gt;If you only set three alarms when adopting LMI, set these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Capacity provider CPU utilization&lt;/strong&gt; - Alarm when sustained CPU exceeds your scaling target (e.g., &amp;gt; 80% for 10 minutes if your target is 50%). This indicates the capacity provider is failing to scale out fast enough or has hit &lt;code&gt;max_vcpu_count&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment concurrency vs limit&lt;/strong&gt; - Alarm when &lt;code&gt;ExecutionEnvironmentConcurrency&lt;/code&gt; reaches &lt;code&gt;ExecutionEnvironmentConcurrencyLimit&lt;/code&gt; for sustained periods. This means processes are saturated and incoming requests are being throttled or queued.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment memory utilization&lt;/strong&gt; - Alarm when memory exceeds 80%. With Python's per-process memory multiplication, hitting memory limits causes new process spawns to fail (&lt;code&gt;InitResourceExhausted&lt;/code&gt;) rather than gradual degradation. Catch this before it happens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These three cover the LMI-specific failure modes that standard Lambda alarms (Errors, Throttles, Duration) won't catch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI configured with a profile (&lt;code&gt;export AWS_PROFILE=your-profile&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Terraform &amp;gt;= 1.11&lt;/li&gt;
&lt;li&gt;Python 3.14+ with boto3 (for the seed script)&lt;/li&gt;
&lt;li&gt;Amazon Nova Multimodal Embeddings model enabled in your AWS account (Bedrock console, Model Access)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repo&lt;/span&gt;
git clone https://github.com/RDarrylR/lambda-managed-instances-similarity-engine.git
&lt;span class="nb"&gt;cd &lt;/span&gt;lambda-managed-instances-similarity-engine

&lt;span class="c"&gt;# Configure&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;infrastructure/terraform.tfvars.example infrastructure/terraform.tfvars
&lt;span class="c"&gt;# Edit terraform.tfvars with your values&lt;/span&gt;

&lt;span class="c"&gt;# Deploy infrastructure&lt;/span&gt;
make init
make apply

&lt;span class="c"&gt;# Seed the product catalog&lt;/span&gt;
make seed

&lt;span class="c"&gt;# Invoke&lt;/span&gt;
make invoke
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cost Analysis
&lt;/h3&gt;

&lt;p&gt;Lambda Managed Instances pricing is fundamentally different from standard Lambda. Understanding when each model wins is the key decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda pricing (arm64/Graviton):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.20 per million requests&lt;/li&gt;
&lt;li&gt;$0.0000133334 per GB-second (arm64)&lt;/li&gt;
&lt;li&gt;No minimum charge, no idle cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lambda Managed Instances pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$0.20 per million requests (same)&lt;/li&gt;
&lt;li&gt;EC2 on-demand instance pricing (varies by type)&lt;/li&gt;
&lt;li&gt;15% management fee on the EC2 on-demand price&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No per-invocation duration charge&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical difference: standard Lambda charges per GB-second of execution. LMI charges for EC2 time regardless of how many requests you serve. At low volume, you're paying for idle EC2 capacity. At high volume, that fixed EC2 cost is amortized across millions of requests.&lt;/p&gt;

&lt;h4&gt;
  
  
  Break-Even: Standard Lambda vs LMI
&lt;/h4&gt;

&lt;p&gt;Consider this workload: 4 GB memory, 200ms average duration, sustained traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda cost per request (arm64):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compute: 4 GB x 0.2s = 0.8 GB-seconds x $0.0000133334 = $0.00001067&lt;/li&gt;
&lt;li&gt;Request: $0.0000002&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$0.0000109 per request&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LMI on a c7g.medium (1 vCPU, 2 GB, ~$0.034/hr on-demand):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 + 15% fee: $0.034 x 1.15 = $0.0391/hr&lt;/li&gt;
&lt;li&gt;With 10 concurrent processes and 200ms per request, each process handles ~5 req/sec&lt;/li&gt;
&lt;li&gt;Instance throughput: ~50 req/sec = ~180,000 req/hr&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost per request: $0.0391 / 180,000 = ~$0.000000217&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this throughput, &lt;strong&gt;LMI is roughly 50x cheaper per request&lt;/strong&gt; than standard Lambda. But the EC2 cost runs 24/7 whether you have traffic or not.&lt;/p&gt;

&lt;h4&gt;
  
  
  Monthly Cost Comparison
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monthly Requests&lt;/th&gt;
&lt;th&gt;Instances Needed&lt;/th&gt;
&lt;th&gt;Standard Lambda (arm64)&lt;/th&gt;
&lt;th&gt;LMI On-Demand (c7g.medium)&lt;/th&gt;
&lt;th&gt;LMI + 1yr Savings Plan&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$11&lt;/td&gt;
&lt;td&gt;$28 + $0.20 = &lt;strong&gt;$28&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$18&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$109&lt;/td&gt;
&lt;td&gt;$28 + $2.00 = &lt;strong&gt;$30&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$546&lt;/td&gt;
&lt;td&gt;$28 + $10.00 = &lt;strong&gt;$38&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$28&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100M&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;$1,091&lt;/td&gt;
&lt;td&gt;$28 + $20.00 = &lt;strong&gt;$48&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500M&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;$5,456&lt;/td&gt;
&lt;td&gt;$112 + $100.00 = &lt;strong&gt;$212&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;~$172&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A single c7g.medium tops out around ~130M requests/month at 50 req/sec sustained. Beyond that, instance count scales roughly linearly with load - 500M req/month requires approximately 4 instances. The LMI columns reflect the actual instance count needed at each volume.&lt;/p&gt;

&lt;p&gt;The break-even is around &lt;strong&gt;2.5M requests/month&lt;/strong&gt; at this memory and duration profile. Below that, standard Lambda wins because you pay nothing when idle. Above that, LMI wins and the advantage grows with volume.&lt;/p&gt;

&lt;h4&gt;
  
  
  Commitment Discounts Change the Math
&lt;/h4&gt;

&lt;p&gt;LMI supports EC2 Savings Plans and Reserved Instances. Standard Lambda supports Compute Savings Plans (up to 17% discount on duration). The discount gap is significant:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Commitment&lt;/th&gt;
&lt;th&gt;Standard Lambda Discount&lt;/th&gt;
&lt;th&gt;LMI Discount (EC2)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;None (on-demand)&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year Compute Savings Plan&lt;/td&gt;
&lt;td&gt;Up to 17%&lt;/td&gt;
&lt;td&gt;Up to 36%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year Compute Savings Plan&lt;/td&gt;
&lt;td&gt;Up to 17%&lt;/td&gt;
&lt;td&gt;Up to 56%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-year EC2 Reserved Instance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Up to 40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-year EC2 Reserved Instance&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Up to 60%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For predictable production workloads with steady traffic, a 3-year commitment on LMI can reduce costs by 60% on the EC2 portion. Standard Lambda's maximum discount is 17%. This difference widens the gap at scale.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hidden Costs
&lt;/h4&gt;

&lt;p&gt;Don't forget the supporting infrastructure that LMI requires and standard Lambda doesn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NAT Gateway&lt;/strong&gt;: ~$32/month + $0.045/GB data transfer (required for VPC telemetry)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoints&lt;/strong&gt; (if used instead of NAT): ~$7.20/month per endpoint per AZ&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB&lt;/strong&gt;: On-demand reads for catalog loading (minimal for small catalogs, significant at scale)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock&lt;/strong&gt;: Nova Multimodal Embeddings per-token pricing for each query embedding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch&lt;/strong&gt;: Log storage and metric costs increase with concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For low-volume workloads, these fixed costs can exceed the compute savings. Factor them into your total cost of ownership.&lt;/p&gt;

&lt;h4&gt;
  
  
  When Each Pricing Model Wins
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Standard Lambda wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is bursty or unpredictable (you pay nothing at zero traffic)&lt;/li&gt;
&lt;li&gt;Monthly volume is below the break-even threshold (~2-3M requests for this workload profile)&lt;/li&gt;
&lt;li&gt;You can't commit to 1-year or 3-year terms&lt;/li&gt;
&lt;li&gt;You don't need VPC connectivity (avoids NAT Gateway cost)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LMI wins when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Traffic is sustained and predictable (the EC2 cost is fully amortized)&lt;/li&gt;
&lt;li&gt;Monthly volume exceeds 5-10M requests&lt;/li&gt;
&lt;li&gt;You can commit to Savings Plans or Reserved Instances&lt;/li&gt;
&lt;li&gt;You need more than 10 GB memory or specific instance types&lt;/li&gt;
&lt;li&gt;You're already paying for VPC infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For this demo, expect to pay for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NAT Gateway (~$0.045/hour + data transfer)&lt;/li&gt;
&lt;li&gt;EC2 instances (varies by type, auto-selected by Lambda)&lt;/li&gt;
&lt;li&gt;DynamoDB on-demand reads (minimal for this catalog size)&lt;/li&gt;
&lt;li&gt;Bedrock embedding calls (per-token pricing for each query)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CLEANUP (IMPORTANT!!)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;This infrastructure costs real money while running - approximately $2-4/day even with zero traffic (NAT Gateway + EC2 managed instances). Don't forget about it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make sure to destroy all resources when you're done:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the capacity provider fails to delete (it can take a few minutes to drain instances), wait and retry. Verify in the AWS console that no EC2 instances tagged with your project name are still running.&lt;/p&gt;




&lt;h2&gt;
  
  
  Networking: Three Supported Patterns
&lt;/h2&gt;

&lt;p&gt;LMI requires VPC connectivity - the function execution environments need outbound network access for telemetry transmission and any AWS service calls. AWS documents three supported connectivity patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Public subnets with an internet gateway&lt;/strong&gt; - simplest, suitable for dev/test only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnets with NAT Gateway&lt;/strong&gt; - the pattern this demo uses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnets with VPC endpoints&lt;/strong&gt; - the most AWS-aligned production pattern&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  NAT Gateway (used in this demo)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Simple to set up - one resource, all outbound traffic routes through it&lt;/li&gt;
&lt;li&gt;~$32/month base + $0.045/GB data transfer&lt;/li&gt;
&lt;li&gt;Traffic leaves your VPC, crosses the public internet (encrypted), then re-enters AWS&lt;/li&gt;
&lt;li&gt;Single point of failure unless you deploy one per AZ (~$64/month for 2-AZ HA)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  VPC Endpoints (recommended for production)
&lt;/h3&gt;

&lt;p&gt;For production, the most AWS-aligned pattern is one VPC endpoint per service per AZ. Traffic stays entirely on the AWS network and never touches the public internet. The endpoint set must cover &lt;strong&gt;every&lt;/strong&gt; service the function calls - if you forget one, the function fails silently or hangs. For this workload, that means:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Required For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.logs&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;CloudWatch Logs (Powertools logger output)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.monitoring&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;CloudWatch Metrics (Powertools metrics)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.xray&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;X-Ray tracing (Powertools tracer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.bedrock-runtime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interface&lt;/td&gt;
&lt;td&gt;Bedrock embedding API calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;com.amazonaws.{region}.dynamodb&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gateway&lt;/td&gt;
&lt;td&gt;DynamoDB catalog queries (free, no per-AZ charge)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Critical security group detail:&lt;/strong&gt; Interface endpoints have their own security groups. They must allow inbound HTTPS (port 443) from the function's security group. The function security group must allow outbound HTTPS to the endpoint security groups. If you skip this, DNS resolves but the connection is silently blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Endpoints should be deployed in each AZ used by the capacity provider to avoid cross-AZ latency and data transfer costs.&lt;/strong&gt; If your capacity provider has subnets in &lt;code&gt;us-east-1a&lt;/code&gt; and &lt;code&gt;us-east-1b&lt;/code&gt;, every interface endpoint also needs ENIs in both AZs. This is the same Cross-AZ Tax pattern from my &lt;a href="https://darryl-ruggles.cloud/eks-and-the-cross-az-tax-how-to-stop-paying-aws-002gb-for-traffic-that-should-never-leave-your-availability-zone/" rel="noopener noreferrer"&gt;previous blog&lt;/a&gt; - cross-AZ data transfer charges apply when traffic from a function in &lt;code&gt;us-east-1a&lt;/code&gt; hits an endpoint ENI in &lt;code&gt;us-east-1b&lt;/code&gt;. Provision endpoints per AZ to keep traffic local.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost math:&lt;/strong&gt; ~$7.20/month per interface endpoint per AZ. With 4 interface endpoints across 2 AZs, that's ~$58/month - roughly double the single NAT Gateway, but cheaper than 2-AZ NAT Gateway HA. The DynamoDB gateway endpoint is free. At high data transfer volumes (more than ~900 GB/month through the NAT Gateway), endpoints become cheaper because there's no per-GB data transfer surcharge for in-region traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When endpoints win on security:&lt;/strong&gt; Always. Traffic never leaves the AWS network. You can attach endpoint policies to restrict which resources each endpoint can access (e.g., limit the Bedrock endpoint to specific model ARNs). This aligns with the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/framework/security.html" rel="noopener noreferrer"&gt;AWS Well-Architected Security Pillar&lt;/a&gt; - minimize the attack surface.&lt;/p&gt;

&lt;p&gt;The Terraform for VPC endpoints is straightforward but verbose. I left it out of this demo to keep the focus on LMI itself. A follow-up project could add a &lt;code&gt;networking_mode&lt;/code&gt; variable that switches between NAT Gateway and VPC endpoints.&lt;/p&gt;




&lt;h2&gt;
  
  
  A few things to watch for:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VPC connectivity isn't optional.&lt;/strong&gt; Lambda Managed Instances requires a VPC. Without outbound connectivity (NAT Gateway or VPC endpoints), your function executes but logs and traces are silently lost. You'll debug a working function with no visible output. This is documented but easy to miss.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling is asynchronous.&lt;/strong&gt; LMI scales based on CPU utilization and execution-environment saturation, not per-invocation demand. &lt;strong&gt;Unlike standard Lambda, scaling isn't triggered by incoming requests - it's driven by resource consumption inside existing execution environments.&lt;/strong&gt; &lt;strong&gt;Because scaling reacts to resource pressure instead of incoming traffic, inefficient code or high memory usage can delay scaling and increase throttling risk.&lt;/strong&gt; The Scaler component decides when to add or remove instances, and instance launches aren't instant. &lt;strong&gt;Lambda maintains headroom so traffic can roughly double within minutes without immediate throttling&lt;/strong&gt;, but if your traffic more than doubles within 5 minutes, you may see 429 throttles while capacity catches up. This is fundamentally different from standard Lambda's near-instant scaling. Plan for it with the target CPU utilization setting - lower values maintain more headroom.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process memory multiplies.&lt;/strong&gt; With Python, each concurrency slot is a separate process. &lt;strong&gt;Because Python uses process-based concurrency, memory usage scales linearly with concurrency - each worker process consumes its own memory. With Python, concurrency isn't "free" - each additional request increases memory consumption linearly.&lt;/strong&gt; If your function uses 500 MB of memory and you set concurrency to 16, that's 8 GB of memory consumed per execution environment. Monitor the &lt;code&gt;MemoryUtilization&lt;/code&gt; metric and tune accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;publish = true&lt;/code&gt; is required.&lt;/strong&gt; LMI runs on published function versions, not &lt;code&gt;$LATEST&lt;/code&gt;. If you forget this, Terraform applies successfully but the function doesn't run on managed instances. Every code change needs a new published version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capacity providers are security boundaries, not isolation boundaries.&lt;/strong&gt; Functions sharing a capacity provider run in containers on the same EC2 instances. This isn't Firecracker isolation. Separate untrusted workloads into separate capacity providers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Powertools minimum version matters.&lt;/strong&gt; Lambda Managed Instances requires Powertools for AWS Lambda (Python) version 3.23.0 or later. Pin the layer version in Terraform rather than using latest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LMI doesn't scale to zero.&lt;/strong&gt; Unlike standard Lambda where you pay nothing at zero traffic, LMI keeps a baseline of warm EC2 instances running for high availability. &lt;strong&gt;AWS launches a baseline of three managed instances for availability across AZs&lt;/strong&gt; when you publish a function version with a capacity provider. In my testing with 2 AZs configured, 2 instances remained active overnight with zero traffic, but the documented baseline is three. There's no minimum instance setting, no Karpenter-style consolidation, and no way to force scale-to-zero short of deleting the function version or capacity provider. This is a meaningful cost difference for dev/test environments where you might leave infrastructure running between sessions. Run &lt;code&gt;make destroy&lt;/code&gt; when you're not actively using the infrastructure, or design your dev environments to use standard Lambda where idle cost is zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quotas to plan around.&lt;/strong&gt; LMI has its own &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html" rel="noopener noreferrer"&gt;service quotas&lt;/a&gt;: 1 request per second on capacity provider write APIs (Create/Update/Delete - rate-limited to prevent infrastructure churn), 100 function versions per capacity provider, and 1,000 capacity providers per account per region. These are soft limits but worth knowing when you start automating capacity provider management or running multiple environments.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  SAM Support
&lt;/h2&gt;

&lt;p&gt;If you came in from the AWS Serverless plugin angle and are wondering whether SAM supports LMI - yes, it does. AWS::Serverless::CapacityProvider is the SAM resource equivalent to &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;. The SAM template syntax is more concise but follows the same model: capacity provider definition, function with &lt;code&gt;CapacityProviderConfig&lt;/code&gt; property, and IAM roles. I chose Terraform for this project because the LMI Terraform path is less documented in the wild and I wanted to fill that gap, but SAM is a perfectly valid choice if your team already uses it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Instance Type Selection
&lt;/h2&gt;

&lt;p&gt;The capacity provider's &lt;code&gt;instance_requirements&lt;/code&gt; block controls which EC2 instance types Lambda selects. By default, Lambda chooses the best fit automatically. You can constrain this with &lt;code&gt;allowed_instance_types&lt;/code&gt; or &lt;code&gt;excluded_instance_types&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Today, the interesting choice is between &lt;code&gt;arm64&lt;/code&gt; (Graviton4 - better price/performance for most workloads) and &lt;code&gt;x86_64&lt;/code&gt;. But the architecture of Lambda Managed Instances - your function code running in containers on EC2 instances you specify - means the compute capabilities available to your functions expand with every new EC2 instance type AWS makes available for LMI.&lt;/p&gt;

&lt;p&gt;The product similarity engine in this project calls Bedrock for query embeddings (I/O-bound) and then computes cosine similarity on CPU (compute-bound). The handler code isn't coupled to a specific compute architecture. The embedding call is behind a clean interface (&lt;code&gt;_embed_query&lt;/code&gt;). The similarity computation is pure math. The instance type is a configuration parameter, not an application concern.&lt;/p&gt;

&lt;p&gt;This is the practical difference between Lambda Managed Instances and standard Lambda. Standard Lambda abstracts the hardware entirely - you get what AWS gives you. Lambda Managed Instances lets you choose, and that choice extends to whatever EC2 instance types AWS makes available.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Lambda Managed Instances fills the gap between standard Lambda and ECS Fargate. The handler function and event-driven invocation pattern stay the same, but you gain EC2 hardware selection, multi-concurrency, configurable memory-to-vCPU ratios, and commitment-based pricing.&lt;/p&gt;

&lt;p&gt;The key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use it for sustained, predictable throughput&lt;/strong&gt; where EC2 pricing beats per-GB-second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose your memory-to-vCPU ratio&lt;/strong&gt; based on whether your workload is compute-bound or memory-bound&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Understand the process model&lt;/strong&gt; for your language - Python uses processes (simple, no shared-memory concerns), Java uses OS threads (requires thread-safe code), Node.js uses worker threads with async dispatch, .NET uses Tasks, and Rust uses Tokio async tasks (handlers must be &lt;code&gt;Clone + Send&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor &lt;code&gt;MemoryUtilization&lt;/code&gt;&lt;/strong&gt; because process memory multiplies with concurrency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The full Terraform configuration, Python handler, seed script, and Makefile are in the &lt;a href="https://github.com/RDarrylR/lambda-managed-instances-similarity-engine" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" rel="noopener noreferrer"&gt;Lambda Managed Instances Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-python-runtime.html" rel="noopener noreferrer"&gt;Lambda Managed Instances - Python Runtime Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2026/03/lambda-32-gb-memory-16-vcpus/" rel="noopener noreferrer"&gt;32 GB Memory / 16 vCPU Announcement (March 2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances-capacity-providers.html" rel="noopener noreferrer"&gt;Capacity Provider Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/nova-embeddings.html" rel="noopener noreferrer"&gt;Amazon Nova Multimodal Embeddings&lt;/a&gt; - Embedding model used in this project&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_capacity_provider" rel="noopener noreferrer"&gt;Terraform aws_lambda_capacity_provider&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/powertools-for-aws-lambda-best-practices-by-default/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda Best Practices&lt;/a&gt; - Observability patterns used in this project&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/elastic-container-service-ecs-my-default-choice-for-containers-on-aws/" rel="noopener noreferrer"&gt;Elastic Container Service - My Default Choice for Containers on AWS&lt;/a&gt; - ECS Fargate and Express Mode comparison&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://darryl-ruggles.cloud/serverless-data-processor-using-aws-lambda-step-functions-and-fargate-on-ecs-with-rust/" rel="noopener noreferrer"&gt;Serverless Data Processor&lt;/a&gt; - Step Functions with Lambda and Fargate&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Connect with me on&lt;/em&gt; &lt;a href="https://x.com/RDarrylR" rel="noopener noreferrer"&gt;X&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://bsky.app/profile/darrylruggles.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://www.linkedin.com/in/darryl-ruggles/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://github.com/RDarrylR" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://medium.com/@RDarrylR" rel="noopener noreferrer"&gt;Medium&lt;/a&gt;&lt;em&gt;,&lt;/em&gt; &lt;a href="https://dev.to/rdarrylr"&gt;Dev.to&lt;/a&gt;&lt;em&gt;, or the&lt;/em&gt; &lt;a href="https://community.aws/@darrylr" rel="noopener noreferrer"&gt;AWS Community&lt;/a&gt;&lt;em&gt;. Check out more of my projects at&lt;/em&gt; &lt;a href="https://darryl-ruggles.cloud" rel="noopener noreferrer"&gt;darryl-ruggles.cloud&lt;/a&gt; &lt;em&gt;and join the&lt;/em&gt; &lt;a href="https://www.believeinserverless.com/" rel="noopener noreferrer"&gt;Believe In Serverless&lt;/a&gt; &lt;em&gt;community.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>lambda</category>
    </item>
  </channel>
</rss>
