<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Danilo Poccia</title>
    <description>The latest articles on DEV Community by Danilo Poccia (@danilop).</description>
    <link>https://dev.to/danilop</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F420454%2F3500d738-0a72-47d9-8d87-1858f0edd5f2.JPG</url>
      <title>DEV Community: Danilo Poccia</title>
      <link>https://dev.to/danilop</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/danilop"/>
    <language>en</language>
    <item>
      <title>How I Used Kiro to Optimize Its Own MCP Configuration</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Mon, 12 Jan 2026 13:21:19 +0000</pubDate>
      <link>https://dev.to/aws/how-i-used-kiro-to-optimize-its-own-mcp-configuration-4mdg</link>
      <guid>https://dev.to/aws/how-i-used-kiro-to-optimize-its-own-mcp-configuration-4mdg</guid>
      <description>&lt;p&gt;I had a problem. My Kiro setup had accumulated 14 MCP servers over time: AWS tools, web automation, documentation servers, email integration. That's dozens of tools, all loading into context at the start of every conversation, whether I needed them or not.&lt;/p&gt;

&lt;p&gt;This is the hidden cost of extensibility. &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; servers give AI assistants access to external tools and services. Powerful, but each one adds to the context window. More tools means more tokens consumed before you ask your first question, slower responses, and an AI that sometimes picks the wrong tool because it has too many options.&lt;/p&gt;

&lt;p&gt;So I asked Kiro to help me build a better solution.&lt;/p&gt;

&lt;h1&gt;
  
  
  Using Kiro CLI to Create Powers
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://kiro.dev/docs/powers/" rel="noopener noreferrer"&gt;Kiro Powers&lt;/a&gt; are the solution to MCP sprawl. They're self-contained packages that bundle MCP servers with documentation and best practices, loading on-demand based on keywords in your conversation. But here's the thing: while Powers currently only work in Kiro IDE (CLI support is coming), &lt;a href="https://kiro.dev/cli/" rel="noopener noreferrer"&gt;Kiro CLI&lt;/a&gt; is excellent at &lt;em&gt;creating&lt;/em&gt; them. So I used the CLI to analyze my MCP configuration and generate properly structured Power folders, then added them to Kiro IDE.&lt;/p&gt;

&lt;h1&gt;
  
  
  Getting Started
&lt;/h1&gt;

&lt;p&gt;I used Kiro CLI with the &lt;a href="https://kiro.dev/pricing/#common-questions" rel="noopener noreferrer"&gt;Auto model&lt;/a&gt;, the default mode that blends frontier models with specialized ones for different tasks. For this kind of work (analyzing config files, researching documentation, generating structured output), Auto picks the right model for each step rather than using a single expensive model for everything.&lt;/p&gt;

&lt;p&gt;First, I copied my &lt;code&gt;mcp.json&lt;/code&gt; from &lt;code&gt;~/.kiro/settings/&lt;/code&gt; to a working folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; ~/optimize-powers
&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.kiro/settings/mcp.json ~/optimize-powers/
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/optimize-powers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I launched Kiro CLI in that folder and gave it this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I have an mcp.json file with too many MCP servers. Help me group and transform them into Kiro Powers. Research carefully how Powers work online, including the folder structure and POWER.md frontmatter syntax. For each logical grouping, create a folder with the correct structure: POWER.md with proper frontmatter (name, displayName, description, keywords, mcpServers), an mcp.json with the server configurations, and a steering/ folder with best practices. Prepare a detailed plan and propose it to me before creating any folders."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That last sentence is important. I didn't want Kiro to just start creating files. I wanted a plan I could review and discuss.&lt;/p&gt;

&lt;h1&gt;
  
  
  What Kiro Proposed
&lt;/h1&gt;

&lt;p&gt;Kiro analyzed my &lt;code&gt;mcp.json&lt;/code&gt;, researched the Powers documentation online, and came back with a consolidation plan. It identified logical groupings based on functionality:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; 14 individual MCP servers, all loading at startup&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt; 7 Powers that load on-demand based on keywords:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;aws-development&lt;/strong&gt; bundles AWS API, documentation, serverless, CDK, and diagram tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;web-development&lt;/strong&gt; handles frontend frameworks and Playwright for testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;web-research&lt;/strong&gt; uses Fetch for browsing documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bedrock-agentcore&lt;/strong&gt; for agent production and deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;strands-agents&lt;/strong&gt; for agent development with the Strands SDK&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;office-tools&lt;/strong&gt; covers email and calendar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;amazon-aurora-dsql&lt;/strong&gt; for database-specific workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: these tools naturally cluster by workflow. When I'm building AWS infrastructure, I need the CDK server and AWS documentation, but not Playwright. When I'm testing a web app, I need browser automation, but not the architecture diagram tools.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Conversation
&lt;/h1&gt;

&lt;p&gt;After reviewing the plan, I had follow-up questions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Can't Bedrock AgentCore and Strands be grouped into a single agentic AI power?"&lt;/em&gt; This led to consolidating them. Both are about building AI agents, just at different stages of the workflow.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"What if I want to use Playwright for web browsing too, not just testing? Would it be duplicated across powers?"&lt;/em&gt; Kiro explained that Powers are namespaced to prevent conflicts, but this led to a better design for my workflows: consolidating into a single "web-research-browse-test" Power that combines Fetch and Playwright for both browsing and testing, separate from frontend development.&lt;/p&gt;

&lt;p&gt;Each question refined the outcome. The AI handled the implementation; I steered the design. The final result: 6 Powers instead of 14 scattered MCP servers.&lt;/p&gt;

&lt;p&gt;After finalizing the plan, I approved it: &lt;em&gt;"I approve your plan, build it!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Kiro created the folder structure for each Power, wrote the &lt;code&gt;POWER.md&lt;/code&gt; files with appropriate keywords and onboarding instructions, configured the &lt;code&gt;mcp.json&lt;/code&gt; for each Power, and added steering files with best practices.&lt;/p&gt;

&lt;h1&gt;
  
  
  What a Kiro Power Looks Like
&lt;/h1&gt;

&lt;p&gt;A Kiro Power is a self-contained package: documentation, best practices, and its own MCP server configuration bundled together. When the Power activates, its MCP servers become available. When it's not needed, they stay out of context.&lt;/p&gt;

&lt;p&gt;Here's the AWS Development Power that Kiro created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws-development/
├── POWER.md
├── mcp.json
└── steering/
    ├── aws-api-workflows.md
    ├── serverless-patterns.md
    ├── cdk-best-practices.md
    └── architecture-diagrams.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;POWER.md&lt;/code&gt; frontmatter defines when the Power activates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-development"&lt;/span&gt;
&lt;span class="na"&gt;displayName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Development"&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Comprehensive&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AWS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;development&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;toolkit..."&lt;/span&gt;
&lt;span class="na"&gt;keywords&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloud"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;serverless"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lambda"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cdk"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; 
           &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudformation"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;infrastructure"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; 
           &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diagrams"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;mcpServers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-api"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-knowledge"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-serverless"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; 
             &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-diagram"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aws-cdk"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;keywords&lt;/code&gt; array is the trigger mechanism. When Kiro sees any of these words in your conversation, it loads this Power automatically. The keyword matching is simple but effective: use words that match how you naturally talk about the workflow.&lt;/p&gt;

&lt;h1&gt;
  
  
  Adding Powers to Kiro IDE
&lt;/h1&gt;

&lt;p&gt;Once Kiro CLI created the Power folders, I added them to Kiro IDE one at a time. In the Powers panel, I clicked "Add power from Local Path" and selected the directory containing the &lt;code&gt;POWER.md&lt;/code&gt; file (e.g., &lt;code&gt;~/optimize-powers/aws-development/&lt;/code&gt;). Each Power appeared in the IDE's Powers list and activates automatically when I use its keywords in conversation.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Results
&lt;/h1&gt;

&lt;p&gt;The difference is immediate. When I'm working on a web frontend, I don't have AWS and email tools cluttering the context. The AI makes better tool choices because it has fewer, more relevant options. Ask about "CDK stacks" and the AWS Development Power loads. Ask about "browser testing" and the web automation Power loads.&lt;/p&gt;

&lt;p&gt;What's interesting is the approach itself. I used Kiro CLI to analyze a configuration, research documentation I wasn't fully familiar with, propose a restructuring plan, discuss edge cases, and implement the folder structures. Then Kiro IDE activated them.&lt;/p&gt;

&lt;h1&gt;
  
  
  Try It Yourself
&lt;/h1&gt;

&lt;p&gt;If your MCP configuration has grown unwieldy, the same approach works. Copy your &lt;code&gt;mcp.json&lt;/code&gt; to a working folder, open Kiro CLI there, and ask it to analyze your servers and propose groupings. Review the plan, discuss any concerns, refine it, then approve and let it create the Power folders. Finally, add each Power to Kiro IDE through the Powers panel.&lt;/p&gt;

&lt;p&gt;Powers don't require MCP servers. You can &lt;a href="https://kiro.dev/docs/powers/create/" rel="noopener noreferrer"&gt;create&lt;/a&gt; documentation-only Powers with steering files for specific frameworks or patterns. And they're portable: push to a GitHub repo and others can install them. The &lt;a href="https://github.com/kirodotdev/powers" rel="noopener noreferrer"&gt;community Powers repository&lt;/a&gt; has examples from Datadog, Figma, Stripe, and others.&lt;/p&gt;

&lt;p&gt;If you haven't tried Kiro yet, you can &lt;a href="https://kiro.dev/pricing/" rel="noopener noreferrer"&gt;get started for free&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  What This Opens Up
&lt;/h1&gt;

&lt;p&gt;I used an AI tool to restructure how that same AI tool accesses its capabilities. Kiro CLI analyzed the configuration, researched documentation, and generated the folder structures. Kiro IDE then activated them. The whole process took about 15 minutes.&lt;/p&gt;

&lt;p&gt;The same pattern applies beyond MCP setups: treat your AI tooling configuration as something that evolves. Start with defaults, use the tools, notice friction, then ask the AI to help reduce it. The configuration files aren't sacred. They're just another part of a codebase that can be refactored and improved. And once Powers come to the CLI, this workflow will get even simpler.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>devex</category>
      <category>kiro</category>
    </item>
    <item>
      <title>Exploring the OpenAI-Compatible APIs in Amazon Bedrock: A CLI Journey Through Project Mantle</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Tue, 16 Dec 2025 12:59:57 +0000</pubDate>
      <link>https://dev.to/aws/exploring-the-openai-compatible-apis-in-amazon-bedrock-a-cli-journey-through-project-mantle-2114</link>
      <guid>https://dev.to/aws/exploring-the-openai-compatible-apis-in-amazon-bedrock-a-cli-journey-through-project-mantle-2114</guid>
      <description>&lt;p&gt;After &lt;a href="https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-bedrock-responses-api-from-openai/" rel="noopener noreferrer"&gt;Amazon Bedrock introduced OpenAI-compatible application programming interfaces (APIs) through Project Mantle&lt;/a&gt;, I decided to explore firsthand what this meant in practice. There's nothing like actually calling endpoints and seeing responses to build real intuition. I needed a way to quickly experiment with both the Responses API and Chat Completions API, compare their behaviors, and understand when to use each one.&lt;/p&gt;

&lt;p&gt;That's why I put together &lt;a href="https://github.com/danilop/bedrock-mantle" rel="noopener noreferrer"&gt;bedrock-mantle&lt;/a&gt;, a command-line interface (CLI) that took shape as I tested these new endpoints. It's designed as an exploration tool—something you can fire up when you want to understand how stateful conversations work, test background processing for long-running tasks, or simply verify that your existing OpenAI software development kit (SDK) code will work with minimal changes.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through what makes these APIs different, show you how to use the CLI for hands-on exploration, and share some insights about when each API makes sense.&lt;/p&gt;

&lt;h1&gt;
  
  
  What Is Project Mantle?
&lt;/h1&gt;

&lt;p&gt;Before diving into the APIs, it's worth understanding what's under the hood. Project Mantle is a distributed inference engine for large-scale model serving on Amazon Bedrock. It's designed to simplify onboarding new models while providing performant serverless inference with sophisticated quality of service controls.&lt;/p&gt;

&lt;p&gt;For developers, the practical benefit is twofold. First, Project Mantle provides out-of-the-box compatibility with OpenAI API specifications—existing code using the OpenAI SDK works with Bedrock models by changing the base URL and API key. Second, it introduces new capabilities like stateful conversation management and asynchronous inference that go beyond simple compatibility.&lt;/p&gt;

&lt;p&gt;The Responses API currently supports the OpenAI GPT OSS 20B and 120B models, with support for additional models coming. The Chat Completions API already works with all Bedrock models powered by Project Mantle.&lt;/p&gt;

&lt;h1&gt;
  
  
  Getting Started
&lt;/h1&gt;

&lt;p&gt;The CLI is packaged as a Python tool and uses &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; for dependency management. Installation takes one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configuration requires two environment variables pointing to the Mantle endpoint and your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://bedrock-mantle.us-east-1.api.aws/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-amazon-bedrock-api-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can get your API key from the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys.html" rel="noopener noreferrer"&gt;Amazon Bedrock console&lt;/a&gt;. With that configured, you're ready to explore.&lt;/p&gt;

&lt;h1&gt;
  
  
  Two APIs, Two Approaches to Conversation State
&lt;/h1&gt;

&lt;p&gt;The heart of Project Mantle is the choice between two APIs: the Responses API and the Chat Completions API. They solve the same fundamental problem—getting responses from language models—but they handle conversation state very differently.&lt;/p&gt;

&lt;p&gt;All the code examples below assume you've set the &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt; and &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; environment variables as described in the Getting Started section. The OpenAI SDK reads these automatically.&lt;/p&gt;

&lt;p&gt;The Responses API maintains conversation state server-side. When you send a message, the server remembers the context automatically using a &lt;code&gt;previous_response_id&lt;/code&gt;. You don't need to send the full conversation history with each request because the server tracks it for you. This simplifies client code, reduces bandwidth (especially for long conversations), and makes tool use integration for agentic workflows more straightforward.&lt;/p&gt;

&lt;p&gt;Here's a basic request using the Responses API, taken from the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html" rel="noopener noreferrer"&gt;official AWS documentation&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! How can you help me today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multi-turn conversations, chain responses using &lt;code&gt;previous_response_id&lt;/code&gt;. Each response object includes an &lt;code&gt;id&lt;/code&gt; field that you pass to the next request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# First turn
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is the capital of France?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# e.g., "resp_abc123..."
&lt;/span&gt;
&lt;span class="c1"&gt;# Second turn: pass the previous response id
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What river runs through it?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server handles the history—you just chain the IDs.&lt;/p&gt;

&lt;p&gt;The Chat Completions API follows the traditional stateless pattern. You manage conversation history client-side and send the full context with each request. The server processes the request and returns a response without retaining any state between calls. This API also supports reasoning effort configuration, giving you control over how much computational effort the model applies to generate responses.&lt;/p&gt;

&lt;p&gt;Here's a basic chat completion request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Chat Completions, you build and maintain the &lt;code&gt;messages&lt;/code&gt; array yourself. Each request includes the complete conversation history, giving you full control over what context the model sees.&lt;/p&gt;

&lt;p&gt;Here's what a typical session looks like with the Responses API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ bedrock-mantle chat --model openai.gpt-oss-120b

Starting chat session
  Model: openai.gpt-oss-120b
  API: Responses API
  Streaming: enabled
  Background: disabled

Type /quit or /q to exit, /clear to reset conversation
------------------------------------------------------------

You: What is the capital of France?
Assistant: The capital of France is Paris.

You: What river runs through it?
Assistant: The Seine River runs through Paris, flowing through the heart of the city.

You: /quit
Goodbye!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how the second question ("What river runs through it?") works without explicitly mentioning Paris. Under the hood, with the Responses API, the server maintains the conversation context, so "it" resolves correctly. With the Chat Completions API, you'd need to include the previous exchange in your request to get the same behavior.&lt;/p&gt;

&lt;p&gt;To switch to the Chat Completions API, add the &lt;code&gt;--completions&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bedrock-mantle chat &lt;span class="nt"&gt;--model&lt;/span&gt; openai.gpt-oss-120b &lt;span class="nt"&gt;--completions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Background Processing for Long-Running Tasks
&lt;/h1&gt;

&lt;p&gt;Some tasks take time. Complex reasoning, extensive analysis, or multi-step processes might run for minutes rather than seconds. Keeping an HTTP connection open for that duration introduces reliability concerns—network timeouts, connection drops, and client resource consumption all become issues.&lt;/p&gt;

&lt;p&gt;The Responses API addresses this with asynchronous inference through background processing. When you enable background mode, requests are queued and processed asynchronously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bedrock-mantle chat &lt;span class="nt"&gt;--model&lt;/span&gt; openai.gpt-oss-120b &lt;span class="nt"&gt;--background&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI submits your message and then polls for completion, showing progress as it waits. This pattern is useful for long-running inference workloads where you'd rather wait confidently than wonder whether your connection will survive.&lt;/p&gt;

&lt;p&gt;Behind the scenes, the CLI handles the polling loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Background mode: poll for completion
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;in_progress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach translates well to production architectures where you might submit a request, receive a job ID, and check back later.&lt;/p&gt;

&lt;h1&gt;
  
  
  Choosing Between APIs
&lt;/h1&gt;

&lt;p&gt;The choice between APIs depends on your requirements. I've found it helpful to think about three dimensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State management&lt;/strong&gt; is the most obvious differentiator. If you want the server to track conversation context automatically, use the Responses API. If you need full control over what context gets sent (perhaps for privacy reasons, or because you're doing custom context management), use the Chat Completions API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data retention&lt;/strong&gt; matters for compliance-sensitive applications. The Responses API stores data for approximately 30 days to support its stateful features. The Chat Completions API follows a zero data retention model—no conversation data is stored between requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model support&lt;/strong&gt; varies between the APIs. The Responses API currently works with OpenAI GPT OSS models (20B and 120B parameters), while the Chat Completions API supports all Bedrock models powered by Project Mantle. You can check available models with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bedrock-mantle list-models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Practical Exploration Patterns
&lt;/h1&gt;

&lt;p&gt;The CLI includes several options that make exploration more productive. Disabling streaming shows you the complete response structure rather than incremental chunks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bedrock-mantle chat &lt;span class="nt"&gt;--model&lt;/span&gt; openai.gpt-oss-120b &lt;span class="nt"&gt;--no-stream&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Streaming is useful when you want to display responses as they arrive. Here's how streaming works with the Responses API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me a story&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And with the Chat Completions API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai.gpt-oss-120b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tell me a story&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the difference in how you process the stream. The Chat Completions API returns chunks with a &lt;code&gt;delta&lt;/code&gt; containing the incremental content, while the Responses API returns typed events.&lt;/p&gt;

&lt;p&gt;Custom system prompts help you test how different personas or instructions affect behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bedrock-mantle chat &lt;span class="nt"&gt;--model&lt;/span&gt; openai.gpt-oss-120b &lt;span class="nt"&gt;--system&lt;/span&gt; &lt;span class="s2"&gt;"You are a helpful assistant who explains concepts simply"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;During a session, the &lt;code&gt;/status&lt;/code&gt; command shows your current configuration, and &lt;code&gt;/clear&lt;/code&gt; resets the conversation state—useful when you want to start fresh without restarting the CLI.&lt;/p&gt;

&lt;h1&gt;
  
  
  What I Learned Building This
&lt;/h1&gt;

&lt;p&gt;Building the CLI taught me some practical lessons about working with these APIs.&lt;/p&gt;

&lt;p&gt;First, the stateful nature of the Responses API changes how you think about error handling. If a request fails mid-conversation, you need to decide whether to retry with the same &lt;code&gt;previous_response_id&lt;/code&gt; or reset the conversation. The server's state might be consistent even if your client didn't receive the response.&lt;/p&gt;

&lt;p&gt;Second, background processing introduces its own considerations. How often should you poll? How long should you wait before giving up? The CLI uses simple fixed-interval polling, but production code might implement exponential backoff to be more efficient.&lt;/p&gt;

&lt;p&gt;Third, streaming and non-streaming responses have different structures. If you're building tooling that works with both modes, you need to handle the response parsing accordingly.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Migration Path
&lt;/h1&gt;

&lt;p&gt;If you're currently using the OpenAI APIs and considering a move to Bedrock, migrating is much easier. The endpoint format, request structure, and response format follow the OpenAI specification. In many cases, changing two environment variables is enough to switch between providers.&lt;/p&gt;

&lt;p&gt;That said, testing matters. The CLI gives you a low-friction way to verify behavior and eventual code changes. Run your typical prompts through both the Responses API and Chat Completions API, observe the responses, and build confidence in how the migration could affect your application.&lt;/p&gt;

&lt;h1&gt;
  
  
  Try It Yourself
&lt;/h1&gt;

&lt;p&gt;The CLI is available on &lt;a href="https://github.com/danilop/bedrock-mantle" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; under the MIT license. Clone it, configure your credentials, and start exploring the tool and its code. The &lt;code&gt;info&lt;/code&gt; command shows you the current configuration, and &lt;code&gt;list-models&lt;/code&gt; tells you what's available in your region.&lt;/p&gt;

&lt;p&gt;Whether you're building applications that currently use the OpenAI APIs and you're curious about what migration to Bedrock would look like, or you're already using Bedrock and want to understand these new capabilities, the CLI provides a playground to build intuition before committing to architectural decisions.&lt;/p&gt;

&lt;p&gt;I'm curious to hear what patterns you discover as you explore. Are there specific use cases where the stateful Responses API simplifies your architecture significantly? Or do you find the control of the stateless Chat Completions API more valuable for your needs? Let me know in the comments what you're building with these capabilities.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The complete code is available at &lt;a href="https://github.com/danilop/bedrock-mantle" rel="noopener noreferrer"&gt;github.com/danilop/bedrock-mantle&lt;/a&gt;. Contributions and feedback welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>python</category>
      <category>openai</category>
    </item>
    <item>
      <title>Building a Semantic Storage for Humans and AI Agents</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Fri, 05 Dec 2025 15:20:48 +0000</pubDate>
      <link>https://dev.to/aws/building-a-semantic-storage-for-humans-and-ai-agents-9h6</link>
      <guid>https://dev.to/aws/building-a-semantic-storage-for-humans-and-ai-agents-9h6</guid>
      <description>&lt;p&gt;This week at AWS re:Invent 2025, &lt;a href="https://aws.amazon.com/s3/features/vectors/" rel="noopener noreferrer"&gt;Amazon S3 Vectors&lt;/a&gt; reached general availability, bringing purpose-built vector storage directly into object storage. S3 Vectors now supports up to 2 billion vectors per index (40x the preview capacity), delivers query latencies around 100ms for frequent queries, and integrates with &lt;a href="https://aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; Knowledge Bases and Amazon OpenSearch Service. About a month earlier, &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/modality-embeddings.html" rel="noopener noreferrer"&gt;Amazon Nova Multimodal Embeddings&lt;/a&gt; became available in Amazon Bedrock, providing a unified embedding model that handles text, images, audio, video, and documents through a single model.&lt;/p&gt;

&lt;p&gt;The combination of these two services creates an interesting opportunity: store any content in &lt;a href="https://aws.amazon.com/s3/" rel="noopener noreferrer"&gt;Amazon S3&lt;/a&gt;, generate embeddings with Nova, index them in S3 Vectors, and you have semantic search across everything—without managing vector database infrastructure.&lt;/p&gt;

&lt;p&gt;That idea became &lt;a href="https://github.com/danilop/semstash" rel="noopener noreferrer"&gt;SemStash&lt;/a&gt;, a semantic storage system that lets you store any content and find it using natural language. Instead of remembering exact file names or maintaining folder hierarchies, you describe what you're looking for: "the presentation about Q3 revenue" or "photos from the beach trip."&lt;/p&gt;

&lt;p&gt;In this post, I'll walk through how SemStash works, the architecture decisions behind it, and how you can use it both as a human tool and as persistent memory for AI agents.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Core Idea
&lt;/h1&gt;

&lt;p&gt;The fundamental concept is straightforward: when you upload a file, SemStash stores it in S3 and generates a vector embedding using Amazon Nova. This embedding captures what the content &lt;em&gt;means&lt;/em&gt;, not just what it &lt;em&gt;contains&lt;/em&gt;. When you search, SemStash converts your query into an embedding and finds content with similar meaning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkop53so0ezwe927m3l3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkop53so0ezwe927m3l3r.png" alt="SemStash Architecture" width="784" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This architecture means you can search across different media types. Upload a photo of a sunset, then find it by searching for "evening sky with orange colors." Upload a meeting recording, then find it by asking for "discussion about the new product launch."&lt;/p&gt;

&lt;h1&gt;
  
  
  Understanding the Building Blocks
&lt;/h1&gt;

&lt;p&gt;Before diving into the implementation, it helps to understand how each underlying technology works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon S3 Vectors
&lt;/h2&gt;

&lt;p&gt;S3 Vectors introduces vector buckets—a new bucket type specifically designed for storing and querying vector embeddings. Unlike regular S3 buckets that store objects, vector buckets organize data into vector indexes where you can run similarity queries. Each vector bucket can hold up to 10,000 indexes, and each index can store tens of millions of vectors.&lt;/p&gt;

&lt;p&gt;The key operations are putting vectors (with optional metadata for filtering), querying for similar vectors, and managing the index lifecycle. Writes are strongly consistent, meaning you can query immediately after inserting. S3 Vectors handles the optimization of your vector data automatically as it evolves, maintaining performance without manual tuning.&lt;/p&gt;

&lt;p&gt;What makes S3 Vectors particularly interesting for this use case is the cost model. Traditional vector databases often require provisioned capacity, but S3 Vectors follows the S3 pattern: you pay for what you store and query, with no infrastructure to manage. For applications like SemStash where queries might be infrequent but storage needs to be durable and scalable, this works well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon Nova Multimodal Embeddings
&lt;/h2&gt;

&lt;p&gt;Embedding models convert content into numerical vectors that capture semantic meaning. What makes Nova Multimodal Embeddings different from earlier models is that it handles multiple content types through a single model, mapping them all into the same semantic space.&lt;/p&gt;

&lt;p&gt;This unified approach means that an embedding from a text description and an embedding from an image can be compared directly. You can search your photo library with a text query like "person smiling on beach" and find matching images, even though the query is text and the content is visual. The same applies to audio and video: search for "piano music" and find audio files containing piano, or search for "outdoor interview" and find video clips matching that description.&lt;/p&gt;

&lt;p&gt;Nova supports four embedding dimensions: 256, 384, 1024, and 3072. Higher dimensions capture more semantic nuance but require more storage. The model uses Matryoshka Representation Learning, which means the first N dimensions of a larger embedding work as a valid smaller embedding, giving you flexibility to balance precision against storage costs.&lt;/p&gt;

&lt;p&gt;For synchronous operations, Nova handles up to 8,192 tokens of text or 30 seconds of audio/video. Longer content can be processed asynchronously with automatic segmentation.&lt;/p&gt;

&lt;h1&gt;
  
  
  How SemStash Works
&lt;/h1&gt;

&lt;p&gt;The architecture separates content storage from semantic indexing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mhniqv06yxp9tam6hmy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mhniqv06yxp9tam6hmy.png" alt="How SemStash Works" width="784" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core library handles all AWS interactions. Five interfaces—CLI, Python API, &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) server, Web UI, and REST API—share this same core, ensuring consistent behavior across all access methods.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage Design
&lt;/h2&gt;

&lt;p&gt;Each stash consists of two S3 buckets: one standard bucket for your files and one vector bucket for embeddings. The content bucket stores original files with their metadata. The vector bucket uses S3 Vectors to store embeddings with matching keys, enabling fast similarity search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4b33cd14jt1rp9gqa8hf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4b33cd14jt1rp9gqa8hf.png" alt="How S3 buckets and vector buckets are used" width="457" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This design keeps content and embeddings synchronized: when you delete a file, its embedding is also removed. The &lt;code&gt;check&lt;/code&gt; command verifies consistency, and &lt;code&gt;sync&lt;/code&gt; repairs any drift that might occur.&lt;/p&gt;

&lt;h2&gt;
  
  
  Content Type Handling
&lt;/h2&gt;

&lt;p&gt;Different content types require different processing before embedding:&lt;/p&gt;

&lt;p&gt;Text files (&lt;code&gt;.txt&lt;/code&gt;, &lt;code&gt;.md&lt;/code&gt;, JSON, HTML, CSV, XML) are embedded directly. The embedding captures the semantic meaning of the text content.&lt;/p&gt;

&lt;p&gt;Images (JPEG, PNG, GIF, WebP) are embedded visually. Nova understands the visual content, so you can search for "red car" or "person smiling" and find matching images.&lt;/p&gt;

&lt;p&gt;Audio files (MP3, WAV, FLAC, OGG) are processed for semantic content. You can search recordings by their spoken content or audio characteristics.&lt;/p&gt;

&lt;p&gt;Video content (MP4, WebM, MOV, MKV) is embedded considering both visual and audio elements. Search for "presentation with charts" or "outdoor interview" and find matching clips.&lt;/p&gt;

&lt;p&gt;Documents receive special handling. PDF files are rendered as images and embedded visually, preserving layout and graphics. Word documents, PowerPoint presentations, and Excel spreadsheets have their text extracted and embedded, making all their content searchable.&lt;/p&gt;

&lt;h1&gt;
  
  
  Using SemStash from the Command Line
&lt;/h1&gt;

&lt;p&gt;The CLI provides the primary human interface. Install it with uv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/danilop/semstash.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Creating a stash sets up the S3 bucket and vector index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;semstash init my-stash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Uploading follows a path model similar to a filesystem. Every piece of content has a path starting with &lt;code&gt;/&lt;/code&gt;, and the trailing slash determines whether you're specifying a folder or an exact path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload to root (file keeps its original name)&lt;/span&gt;
semstash my-stash upload vacation-photo.jpg /

&lt;span class="c"&gt;# Upload to a folder (trailing slash = folder)&lt;/span&gt;
semstash my-stash upload meeting-notes.txt /notes/

&lt;span class="c"&gt;# Upload with tags for organization&lt;/span&gt;
semstash my-stash upload &lt;span class="k"&gt;*&lt;/span&gt;.jpg /photos/ &lt;span class="nt"&gt;--tag&lt;/span&gt; vacation &lt;span class="nt"&gt;--tag&lt;/span&gt; 2024
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Searching uses natural language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;semstash my-stash query &lt;span class="s2"&gt;"beach sunset"&lt;/span&gt;
semstash my-stash query &lt;span class="s2"&gt;"financial projections for next year"&lt;/span&gt;
semstash my-stash query &lt;span class="s2"&gt;"action items from last meeting"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results are ranked by semantic similarity. You can filter by tags or path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;semstash my-stash query &lt;span class="s2"&gt;"sunset"&lt;/span&gt; &lt;span class="nt"&gt;--tag&lt;/span&gt; photos
semstash my-stash query &lt;span class="s2"&gt;"meeting notes"&lt;/span&gt; &lt;span class="nt"&gt;--path&lt;/span&gt; /notes/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  The Python API
&lt;/h1&gt;

&lt;p&gt;For programmatic access, the same functionality is available through a Python library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;semstash&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SemStash&lt;/span&gt;

&lt;span class="c1"&gt;# Create and initialize storage
&lt;/span&gt;&lt;span class="n"&gt;stash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemStash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-stash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Upload content to root
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;photo.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vacation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;beach&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stored at: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# /photo.jpg
&lt;/span&gt;
&lt;span class="c1"&gt;# Upload to a folder (preserves filename)
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;notes.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/docs/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stored at: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# /docs/notes.txt
&lt;/span&gt;
&lt;span class="c1"&gt;# Query semantically
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunset on beach&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Download: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Query with path filter
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting notes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/docs/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get content metadata and URL
&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/photo.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, Size: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Browse a folder
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;browse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/docs/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Download content locally
&lt;/span&gt;&lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/photo.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./local-copy.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Delete when done
&lt;/span&gt;&lt;span class="n"&gt;stash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/photo.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API supports all the same operations as the CLI: &lt;code&gt;init()&lt;/code&gt;, &lt;code&gt;open()&lt;/code&gt;, &lt;code&gt;upload()&lt;/code&gt;, &lt;code&gt;query()&lt;/code&gt;, &lt;code&gt;get()&lt;/code&gt;, &lt;code&gt;download()&lt;/code&gt;, &lt;code&gt;delete()&lt;/code&gt;, &lt;code&gt;browse()&lt;/code&gt;, &lt;code&gt;check()&lt;/code&gt;, &lt;code&gt;sync()&lt;/code&gt;, and &lt;code&gt;destroy()&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Web Interface
&lt;/h1&gt;

&lt;p&gt;For browser-based access, SemStash includes a web interface that provides a visual way to interact with your semantic storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;semstash web
&lt;span class="c"&gt;# Open http://localhost:8000/ui/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The web interface makes SemStash accessible without command-line experience. The dashboard at &lt;code&gt;/ui/&lt;/code&gt; shows storage statistics and provides quick actions. The upload page at &lt;code&gt;/ui/upload&lt;/code&gt; supports drag-and-drop file uploads with target path specification. Browse pages at &lt;code&gt;/ui/browse/{path}&lt;/code&gt; offer paginated content lists with folder navigation. The search page at &lt;code&gt;/ui/search&lt;/code&gt; provides semantic search with relevance scores displayed alongside results. Content pages at &lt;code&gt;/ui/content/{path}&lt;/code&gt; show previews, metadata, and download/delete options.&lt;/p&gt;

&lt;p&gt;The browse interface lets you navigate your content by folder structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3i490l1nyk37jmo9udy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3i490l1nyk37jmo9udy.png" alt="Browse content with folder navigation" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Semantic search results display with relevance scores, making it clear how well each result matches your query:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F623qahk4airmpg33r09s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F623qahk4airmpg33r09s.png" alt="Semantic search with relevance scores" width="800" height="472"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure the server with environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SEMSTASH_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-stash
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SEMSTASH_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.0.0.0     &lt;span class="c"&gt;# Optional: bind address&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SEMSTASH_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8000        &lt;span class="c"&gt;# Optional: port number&lt;/span&gt;
semstash web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  REST API
&lt;/h1&gt;

&lt;p&gt;The same server that hosts the web interface exposes a REST API for programmatic HTTP access. Interactive documentation is available at &lt;code&gt;/docs&lt;/code&gt;. Key endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST   /init              Create new storage
POST   /open              Open existing storage
POST   /upload            Upload files (multipart form with target path)
GET    /query?q=...       Semantic search (supports path= filter)
GET    /content/{path}    Get metadata and download URL
DELETE /content/{path}    Remove content
GET    /browse/{path}     List stored content at path
GET    /stats             Storage statistics
GET    /check             Consistency check
POST   /sync              Repair inconsistencies
DELETE /destroy           Remove storage (irreversible)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Interactive API documentation is available at &lt;code&gt;/docs&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  MCP Server for AI Agents
&lt;/h1&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; (MCP) server gives AI assistants persistent semantic memory. Start it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;semstash mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For MCP-compatible assistants, add to your configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"semstash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"semstash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"SEMSTASH_BUCKET"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-agent-memory"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The MCP server exposes tools for uploading content, querying semantically, browsing stored items, and managing the stash. Agents can save information they discover and retrieve it in future conversations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using SemStash with Strands Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://strandsagents.com/" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; provides first-class support for MCP, making it straightforward to give agents access to SemStash as persistent memory. Here's how to connect a Strands agent to the SemStash MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.tools.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;

&lt;span class="c1"&gt;# Create the MCP client for SemStash
&lt;/span&gt;&lt;span class="n"&gt;semstash_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semstash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEMSTASH_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Use the MCP client with an agent
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;semstash_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semstash_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools_sync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# The agent can now store and retrieve information semantically
&lt;/span&gt;    &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Save this information: The quarterly report deadline is March 15th&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Later, the agent can recall it
&lt;/span&gt;    &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When is the quarterly report due?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent doesn't need to know the exact phrasing or keywords used when information was stored. The semantic search finds relevant content based on meaning, so asking about "quarterly report due date" will find content stored with "quarterly report deadline."&lt;/p&gt;

&lt;p&gt;For agents that need to combine SemStash with other tools, you can include the MCP client alongside other tool providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.tools.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;

&lt;span class="n"&gt;semstash_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semstash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEMSTASH_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;semstash_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semstash_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools_sync&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Combine MCP tools with other tools
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;mcp_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Agent can use all tools together
&lt;/span&gt;    &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What time is it, and do I have any meetings scheduled today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern lets agents build knowledge over time. An agent working on a research project can save findings as it discovers them, then recall relevant information when answering questions or generating reports.&lt;/p&gt;

&lt;h1&gt;
  
  
  Configuration and Tuning
&lt;/h1&gt;

&lt;p&gt;SemStash works with sensible defaults but supports customization through environment variables or a configuration file.&lt;/p&gt;

&lt;p&gt;These environment variables configure the web server, MCP server, and Python API. The CLI takes the bucket name as a command argument instead.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;SEMSTASH_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-stash        &lt;span class="c"&gt;# Bucket name (for web/MCP/Python API)&lt;/span&gt;
&lt;span class="nv"&gt;SEMSTASH_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1       &lt;span class="c"&gt;# AWS region&lt;/span&gt;
&lt;span class="nv"&gt;SEMSTASH_DIMENSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3072         &lt;span class="c"&gt;# Embedding dimension (256, 384, 1024, 3072)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or through a configuration file (&lt;code&gt;semstash.toml&lt;/code&gt; or &lt;code&gt;.semstash.toml&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[aws]&lt;/span&gt;
&lt;span class="py"&gt;region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"us-east-1"&lt;/span&gt;

&lt;span class="nn"&gt;[embeddings]&lt;/span&gt;
&lt;span class="py"&gt;dimension&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3072&lt;/span&gt;

&lt;span class="nn"&gt;[output]&lt;/span&gt;
&lt;span class="py"&gt;format&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"table"&lt;/span&gt;  &lt;span class="c"&gt;# or "json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The embedding dimension is the main tuning parameter. Higher dimensions (3072) capture more semantic nuance, while lower dimensions (256, 384, 1024) reduce storage costs with some accuracy trade-off. The dimension is set when you create a stash and cannot be changed afterward—when you open an existing stash, SemStash automatically uses its configured dimension.&lt;/p&gt;

&lt;h1&gt;
  
  
  AWS Requirements
&lt;/h1&gt;

&lt;p&gt;SemStash requires AWS credentials with permissions for S3 (creating and managing buckets, uploading and downloading objects), S3 Vectors (creating indexes, storing and querying vectors), and Bedrock (invoking the Nova embeddings model).&lt;/p&gt;

&lt;p&gt;The default region is &lt;code&gt;us-east-1&lt;/code&gt;. You can check &lt;a href="https://builder.aws.com/build/capabilities" rel="noopener noreferrer"&gt;AWS regional availability&lt;/a&gt; for Amazon Bedrock, S3, and S3 Vectors in other regions.&lt;/p&gt;

&lt;h1&gt;
  
  
  Maintenance
&lt;/h1&gt;

&lt;p&gt;A few commands help keep your stash healthy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify content and embeddings are synchronized&lt;/span&gt;
semstash my-stash check

&lt;span class="c"&gt;# Repair any inconsistencies&lt;/span&gt;
semstash my-stash &lt;span class="nb"&gt;sync&lt;/span&gt;

&lt;span class="c"&gt;# See storage statistics&lt;/span&gt;
semstash my-stash stats

&lt;span class="c"&gt;# Permanently remove a stash (irreversible)&lt;/span&gt;
semstash my-stash destroy &lt;span class="nt"&gt;--force&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The check command reports orphaned embeddings (vectors without content) or missing embeddings (content without vectors). The sync command repairs these issues by removing orphans and regenerating missing embeddings.&lt;/p&gt;

&lt;h1&gt;
  
  
  What I Learned Building This
&lt;/h1&gt;

&lt;p&gt;Working with S3 Vectors and Nova Multimodal Embeddings together highlighted a few things.&lt;/p&gt;

&lt;p&gt;The unified semantic space that Nova provides enables searches that would be difficult to express otherwise. Searching for text and finding images based on meaning, or finding video clips from audio descriptions, opens up workflows that don't fit the traditional keyword-search model. I found myself uploading content without worrying about file organization, trusting that I could describe what I needed later.&lt;/p&gt;

&lt;p&gt;S3 Vectors fits naturally into applications where you need durable, scalable vector storage without managing infrastructure. The serverless model—pay for what you store and query—aligns well with applications where usage patterns might be bursty or unpredictable. The GA release this week at re:Invent brought significant improvements: 2 billion vectors per index, ~100ms latencies for frequent queries, and integration with Bedrock Knowledge Bases.&lt;/p&gt;

&lt;p&gt;Building for multiple interfaces (CLI, Python API, Web UI, REST API, MCP server) from a shared core turned out to be the right decision early on. Each interface serves a different use case, but they all exercise the same underlying logic, which made testing more straightforward and behavior consistent.&lt;/p&gt;

&lt;p&gt;The most interesting use case that emerged was using SemStash as memory for AI agents through the MCP server. Agents can accumulate knowledge over time—saving information they discover and retrieving it in future conversations—without the application developer building custom storage infrastructure. This pattern of "semantic memory" for agents feels like it has broader applications beyond what I've implemented here.&lt;/p&gt;

&lt;p&gt;The code is available on &lt;a href="https://github.com/danilop/semstash" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; under the MIT license. I'd be curious to hear how others use it, particularly for the agent memory use case.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>python</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Strands Agents now speaks TypeScript: A side-by-side guide</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Thu, 04 Dec 2025 15:14:46 +0000</pubDate>
      <link>https://dev.to/aws/strands-agents-now-speaks-typescript-a-side-by-side-guide-12b3</link>
      <guid>https://dev.to/aws/strands-agents-now-speaks-typescript-a-side-by-side-guide-12b3</guid>
      <description>&lt;p&gt;The &lt;a href="https://strandsagents.com/" rel="noopener noreferrer"&gt;Strands Agents SDK&lt;/a&gt;, which was released as open source in May 2025, recently added TypeScript support as a &lt;strong&gt;public preview&lt;/strong&gt;. Developers who prefer Python have had access to Strands since launch. Now TypeScript developers can build AI agents using the same model-driven approach, with full type safety and modern async patterns.&lt;/p&gt;

&lt;p&gt;This article compares the Python and TypeScript implementations side by side, building a practical example along the way.&lt;/p&gt;

&lt;p&gt;Because the Strands Agents TypeScript SDK is currently in &lt;strong&gt;public preview&lt;/strong&gt;, APIs may change as the SDK is refined, and some features available in the Strands Agents Python SDK are not yet implemented. The core functionality—agents, tools, MCP integration, streaming—works as expected, but this is an early release. Feedback and contributions are welcome via the &lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Model-Driven Approach
&lt;/h1&gt;

&lt;p&gt;Many agent frameworks are workflow-driven: you define explicit chains of actions where the agent does A, then B, then C. You're essentially programming the agent's behavior in advance. This works well for predictable tasks but becomes brittle when dealing with novel situations.&lt;/p&gt;

&lt;p&gt;Strands Agents takes a different approach. Instead of prescribing steps, you give the agent a goal and a set of tools and let the LLM figure out how to accomplish the task. The model's reasoning capabilities—its ability to plan, reflect, and adapt—drive the behavior. This is what Strands calls the "model-driven" approach.&lt;/p&gt;

&lt;p&gt;At the core of every Strands agent is the &lt;strong&gt;agent loop&lt;/strong&gt;: the model receives context (conversation history, system prompt, tool descriptions), decides what to do next, optionally calls tools, observes the results, and repeats until it produces a final response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5tv85k95n3qni0m7whh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz5tv85k95n3qni0m7whh.png" alt="AI agent loop" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For details on how this works, see the &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/agent-loop/" rel="noopener noreferrer"&gt;Agent Loop documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Building a Task Planning Agent
&lt;/h1&gt;

&lt;p&gt;Let's build something practical: an agent that helps break down a project into tasks. We'll give it two tools—one to add tasks and one to list them—and let it figure out how to help the user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining Tools
&lt;/h2&gt;

&lt;p&gt;The two Strands SDKs take different approaches to tool definition that reflect each language's idioms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt; uses the &lt;code&gt;@tool&lt;/code&gt; decorator with docstrings and type hints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;

&lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Add a task to the plan.

    Args:
        description: What needs to be done
        priority: Priority level (high, medium, low)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Added: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_tasks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Show all tasks in the current plan.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No tasks yet.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- [&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strands extracts the tool specification from the function signature, type hints, and docstring automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TypeScript&lt;/strong&gt; uses &lt;a href="https://zod.dev/" rel="noopener noreferrer"&gt;Zod&lt;/a&gt; schemas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;addTask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;add_task&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Add a task to the plan.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What needs to be done&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Priority level (high, medium, low)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`Added: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; [&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;listTasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;list_tasks&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Show all tasks in the current plan.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({}),&lt;/span&gt;
    &lt;span class="na"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No tasks yet.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;tasks&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`- [&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toUpperCase&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zod is a TypeScript-first schema validation library that solves two problems at once: runtime validation and type inference. TypeScript's type system only works at compile time, but when your agent receives input from an LLM, you need runtime validation. Zod validates the data and automatically infers TypeScript types from your schema—define once, get both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating and Running the Agent
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a planning assistant. Help users break down projects into actionable tasks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;add_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;list_tasks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Help me plan a surprise birthday party for next Saturday&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a planning assistant. Help users break down projects into actionable tasks.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;addTask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;listTasks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Help me plan a surprise birthday party for next Saturday&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent will reason through what's needed—venue, guest list, food, decorations, timeline—call &lt;code&gt;add_task&lt;/code&gt; multiple times with appropriate priorities, then call &lt;code&gt;list_tasks&lt;/code&gt; to show the plan. You didn't program that sequence; the model figured it out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample Output
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I've created a plan for the surprise birthday party:

- [HIGH] Confirm venue and time for Saturday
- [HIGH] Create guest list and send invitations ASAP
- [HIGH] Arrange for someone to bring the birthday person to the venue
- [MEDIUM] Order or bake birthday cake
- [MEDIUM] Plan food and drinks menu
- [MEDIUM] Buy decorations (balloons, banner, etc.)
- [LOW] Create a party playlist
- [LOW] Plan games or activities
- [LOW] Arrange for someone to take photos

Given the short timeline, I've prioritized the items that need immediate action. 
Would you like me to break any of these down further?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  How the Two Languages Differ
&lt;/h1&gt;

&lt;p&gt;The most visible difference is in tool definition. Python extracts schemas from docstrings and type hints—you write documentation as you normally would, and Strands uses it. TypeScript relies on Zod schemas, which provide both runtime validation and compile-time type inference in a single definition.&lt;/p&gt;

&lt;p&gt;Invocation patterns also differ slightly. In Python, you call the agent directly with &lt;code&gt;agent("prompt")&lt;/code&gt;, and async is optional. In TypeScript, you use &lt;code&gt;await agent.invoke("prompt")&lt;/code&gt;, and async is the default model throughout. Both approaches describe the same thing to the LLM—they just do it in ways idiomatic to each language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Responses
&lt;/h2&gt;

&lt;p&gt;For interactive applications, both Strands SDKs support streaming so users see output as it's generated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Plan a weekend hiking trip&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Plan a weekend hiking trip&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Model Context Protocol (MCP)
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; lets you connect agents to external tools and services. Both Strands SDKs have built-in MCP support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.tools.mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;

&lt;span class="n"&gt;mcp_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uvx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awslabs.aws-documentation-mcp-server@latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;mcp_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mcp_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools_sync&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I configure an S3 bucket?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;McpClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;StdioClientTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@strands-agents/sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mcpClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;McpClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioClientTransport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;uvx&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;awslabs.aws-documentation-mcp-server@latest&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;mcpClient&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;How do I configure an S3 bucket?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/tools/mcp-tools/" rel="noopener noreferrer"&gt;MCP Tools documentation&lt;/a&gt; for more transport options.&lt;/p&gt;

&lt;h1&gt;
  
  
  Beyond the Basics
&lt;/h1&gt;

&lt;p&gt;The example above covers the core workflow, but both Strands SDKs offer more. Long-running conversations can exhaust the model's context window, so both provide conversation managers that automatically maintain a sliding window of recent messages. The Strands Agents Python SDK goes further with a summarizing conversation manager that compresses old messages rather than discarding them—useful when you need to preserve context over very long interactions.&lt;/p&gt;

&lt;p&gt;For complex problems where a single agent isn't enough, the Strands Agents Python SDK provides multi-agent patterns. You can wrap specialized agents as tools that an orchestrator calls, create swarms where agents hand off tasks to each other autonomously, or define explicit graphs with deterministic execution order. These patterns aren't yet available in the TypeScript SDK. See the &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/multi-agent/multi-agent-patterns/" rel="noopener noreferrer"&gt;Multi-Agent Patterns documentation&lt;/a&gt; for details.&lt;/p&gt;

&lt;p&gt;The Strands Agents Python SDK also supports structured output through Pydantic models, letting you constrain the agent's response to a specific schema. This is particularly useful when you need to parse the agent's output programmatically.&lt;/p&gt;

&lt;h1&gt;
  
  
  Feature Comparison
&lt;/h1&gt;

&lt;p&gt;The table below summarizes what's available in each SDK. The Strands Agents Python SDK is stable with the full feature set, and the Strands Agents TypeScript SDK is in preview with core functionality available.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Strands Python (stable)&lt;/th&gt;
&lt;th&gt;Strands TypeScript (preview)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core agent loop&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom tools&lt;/td&gt;
&lt;td&gt;✅ (decorator)&lt;/td&gt;
&lt;td&gt;✅ (Zod)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP integration&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Amazon Bedrock&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic API&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conversation management&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured output&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-agent (Swarm/Graph)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Community tools package&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session persistence&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenTelemetry traces&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  When to Choose Which
&lt;/h1&gt;

&lt;p&gt;The choice between SDKs often comes down to your existing stack and requirements. If your application is already in JavaScript or TypeScript and you're comfortable with preview-stage software, the Strands Agents TypeScript SDK gives you the core agent functionality with compile-time type safety and modern async patterns. It works in both Node.js and browser environments.&lt;/p&gt;

&lt;p&gt;If you need the full feature set—multi-agent orchestration, structured output, more model providers, or the &lt;a href="https://github.com/strands-agents/tools" rel="noopener noreferrer"&gt;community tools package&lt;/a&gt;—the Strands Agents Python SDK is the more complete option for now.&lt;/p&gt;

&lt;h1&gt;
  
  
  Getting Started
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv init my-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;my-agent
uv add strands-agents strands-agents-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;my-agent &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;my-agent
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @strands-agents/sdk zod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both SDKs default to Amazon Bedrock as the model provider, which requires valid AWS credentials. If you're new to AWS, the &lt;a href="https://aws.amazon.com/free/" rel="noopener noreferrer"&gt;AWS Free Tier&lt;/a&gt; provides $100 in credits at sign-up plus up to $100 more by completing onboarding activities— one of which involves using Amazon Bedrock. The free plan lasts six months, and you won't be charged unless you upgrade. See the Strands Agents &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/quickstart/" rel="noopener noreferrer"&gt;Quickstart guide&lt;/a&gt; for setup details.&lt;/p&gt;

&lt;h1&gt;
  
  
  Resources
&lt;/h1&gt;

&lt;p&gt;The &lt;a href="https://strandsagents.com/latest/documentation/docs/" rel="noopener noreferrer"&gt;Strands Agents documentation&lt;/a&gt; is the best starting point, with guides for both SDKs. The source code is available on GitHub for both the &lt;a href="https://github.com/strands-agents/sdk-python" rel="noopener noreferrer"&gt;Python SDK&lt;/a&gt; and the &lt;a href="https://github.com/strands-agents/sdk-typescript" rel="noopener noreferrer"&gt;TypeScript SDK&lt;/a&gt;. For Python, the &lt;a href="https://github.com/strands-agents/tools" rel="noopener noreferrer"&gt;community tools package&lt;/a&gt; provides ready-to-use tools, and the &lt;a href="https://github.com/strands-agents/samples" rel="noopener noreferrer"&gt;samples repository&lt;/a&gt; has complete example agents.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping Up
&lt;/h1&gt;

&lt;p&gt;The Strands Agents TypeScript SDK brings the model-driven approach to the JavaScript ecosystem. While the Python SDK currently remains the fully featured option, the TypeScript preview delivers the core experience: define tools, create an agent, and let the model reason through problems.&lt;/p&gt;

&lt;p&gt;The Strands Agents TypeScript SDK preview status means now is a good time to experiment, provide feedback, and help shape its development.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>typescript</category>
      <category>python</category>
    </item>
    <item>
      <title>Modernizing Python Projects: Converting requirements.txt to uv in One Command</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Thu, 06 Nov 2025 14:22:22 +0000</pubDate>
      <link>https://dev.to/danilop/modernizing-python-projects-converting-requirementstxt-to-uv-in-one-command-5ali</link>
      <guid>https://dev.to/danilop/modernizing-python-projects-converting-requirementstxt-to-uv-in-one-command-5ali</guid>
      <description>&lt;p&gt;When cloning Python projects that I want to test or use for a demo, I often find out that the repository uses a &lt;code&gt;requirements.txt&lt;/code&gt; file instead of a modern &lt;code&gt;pyproject.toml&lt;/code&gt;. While &lt;code&gt;requirements.txt&lt;/code&gt; has served the Python community well for years, I've come to appreciate the speed and simplicity of &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, a fast Python package manager that uses the standardized &lt;code&gt;pyproject.toml&lt;/code&gt; format. But converting between these formats manually? That can be tedious and error-prone, especially when dealing with complex dependency specifications.&lt;/p&gt;

&lt;p&gt;That's why I built &lt;a href="https://github.com/danilop/requirements-to-uv" rel="noopener noreferrer"&gt;requirements-to-uv&lt;/a&gt;: a command-line tool that automatically converts Python projects from &lt;code&gt;requirements.txt&lt;/code&gt; to uv-managed &lt;code&gt;pyproject.toml&lt;/code&gt; with a single command. Whether you're modernizing your own projects or quickly setting up cloned repositories, this tool handles the conversion details so you can focus on actually working with the code.&lt;/p&gt;

&lt;p&gt;In this post, I'll walk you through what makes this conversion non-trivial, show you how to use the tool in seconds, and explain some of the intelligent logic that makes it work reliably across different project structures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Move from requirements.txt to uv?
&lt;/h2&gt;

&lt;p&gt;Before diving into the conversion process, it's worth understanding why this matters. The &lt;code&gt;requirements.txt&lt;/code&gt; format has been the standard for dependency management in Python for many years, but it has limitations. For example, there's no standardized way (apart from file names) to separate development dependencies from production ones. And different tools interpret the format slightly differently, leading to inconsistencies.&lt;/p&gt;

&lt;p&gt;Enter &lt;code&gt;pyproject.toml&lt;/code&gt;: a standardized format defined in &lt;a href="https://peps.python.org/pep-0518/" rel="noopener noreferrer"&gt;PEP 518&lt;/a&gt; that consolidates project metadata and dependencies in one place. When combined with uv, you get lightning-fast dependency resolution and installation. The uv tool provides a consistent development workflow across projects and uses lock files for truly reproducible environments.&lt;/p&gt;

&lt;p&gt;The transition from &lt;code&gt;requirements.txt&lt;/code&gt; to &lt;code&gt;pyproject.toml&lt;/code&gt; with uv isn't just about following trends—it's about faster builds, more reliable environments, and better project organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Challenge: The requirements.txt File Is More Complex Than It Looks
&lt;/h2&gt;

&lt;p&gt;At first glance, converting a &lt;code&gt;requirements.txt&lt;/code&gt; file to &lt;code&gt;pyproject.toml&lt;/code&gt; seems straightforward. Just copy the package names and versions, right? Not quite. The &lt;code&gt;requirements.txt&lt;/code&gt; files you can find in the wild are surprisingly complex, and a naive conversion would lose important information or break dependencies entirely.&lt;/p&gt;

&lt;p&gt;Consider what a real-world requirements file might contain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Standard packages with version specifiers
requests&amp;gt;=2.28.0
flask[async]&amp;gt;=3.0.0

# Git dependencies with specific branches
git+https://github.com/user/repo.git@main#egg=mypackage
-e git+ssh://git@github.com/user/another.git@develop

# Local path dependencies
-e ./local-package
../another-package

# Environment markers for conditional installation
pytest&amp;gt;=8.0.0 ; python_version &amp;gt;= "3.8"

# Poetry-style version constraints that aren't valid in pyproject.toml
django^4.2.0

# Package extras
celery[redis,msgpack]&amp;gt;=5.3.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of these patterns requires different handling in the &lt;code&gt;pyproject.toml&lt;/code&gt; format. Git dependencies need to be split into a regular dependency entry and a separate &lt;code&gt;[tool.uv.sources]&lt;/code&gt; section that specifies the repository URL and branch. Poetry-style caret (&lt;code&gt;^&lt;/code&gt;) constraints need to be converted to the equivalent range syntax. Local paths require special source declarations. Environment markers need to be preserved exactly.&lt;/p&gt;

&lt;p&gt;Beyond parsing complexity, there's also the challenge of project structure. Many repositories have multiple requirements files: &lt;code&gt;requirements.txt&lt;/code&gt; for production, &lt;code&gt;requirements-dev.txt&lt;/code&gt; for development tools, &lt;code&gt;requirements-test.txt&lt;/code&gt; for testing frameworks, and so on. A proper conversion should detect these patterns and organize them into the appropriate dependency groups in &lt;code&gt;pyproject.toml&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intelligent Metadata Detection
&lt;/h2&gt;

&lt;p&gt;One of the tool's most useful features is automatic project metadata detection. When you run &lt;code&gt;req2uv&lt;/code&gt; in a Python project directory, it doesn't just convert dependencies—it tries to build a complete, valid &lt;code&gt;pyproject.toml&lt;/code&gt; by gathering information from multiple sources.&lt;/p&gt;

&lt;p&gt;For the project name, the tool starts with the current directory name and normalizes it according to Python packaging standards (replacing spaces and special characters with hyphens). For the version, it searches for version declarations in &lt;code&gt;__init__.py&lt;/code&gt; files, checks &lt;code&gt;setup.py&lt;/code&gt; if present, looks at git tags, and falls back to &lt;code&gt;0.1.0&lt;/code&gt; if nothing else is found. The Python version requirement is detected from &lt;code&gt;.python-version&lt;/code&gt; files, setup.py classifiers, or defaults to the currently running Python version. The description comes from the first line of your README file, and author information is pulled from your git configuration.&lt;/p&gt;

&lt;p&gt;This intelligent detection means you're not starting with a minimal skeleton file—you get a properly structured &lt;code&gt;pyproject.toml&lt;/code&gt; that actually describes your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Smart Merging with Existing Files
&lt;/h2&gt;

&lt;p&gt;Not every project is starting from scratch. Sometimes you're modernizing a project that already has a partial &lt;code&gt;pyproject.toml&lt;/code&gt; file, perhaps created manually or by another tool. The requirements-to-uv tool handles this scenario carefully.&lt;/p&gt;

&lt;p&gt;When a &lt;code&gt;pyproject.toml&lt;/code&gt; already exists, the tool merges new information without destroying what's there. It preserves existing metadata and configuration sections, appends new dependencies to existing lists (detecting and warning about duplicates), and creates a backup file (&lt;code&gt;pyproject.toml.backup&lt;/code&gt;) before making changes. This merge logic uses a sophisticated approach that understands the structure of TOML files and dependency declarations, ensuring that manual customizations aren't lost during the conversion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Edge Cases and Limitations
&lt;/h2&gt;

&lt;p&gt;Python packaging has accumulated many special features over the years, and not all of them translate cleanly to the modern &lt;code&gt;pyproject.toml&lt;/code&gt; format. The tool handles these cases transparently while keeping you informed.&lt;/p&gt;

&lt;p&gt;For package hashes (the &lt;code&gt;--hash=sha256:...&lt;/code&gt; format), these aren't supported in &lt;code&gt;pyproject.toml&lt;/code&gt; because uv uses lock files for reproducibility instead. The tool strips these out but generates a comment explaining the change. Custom package indexes specified with &lt;code&gt;--index-url&lt;/code&gt; or &lt;code&gt;--extra-index-url&lt;/code&gt; also can't be stored directly in &lt;code&gt;pyproject.toml&lt;/code&gt;. The tool adds a comment with the original URL so you can configure it through uv's CLI or configuration file instead.&lt;/p&gt;

&lt;p&gt;SSH git URLs present an interesting challenge. While GitHub, GitLab, and Bitbucket SSH URLs can be automatically converted to their HTTPS equivalents, URLs from other hosts generate warnings since the conversion might not be straightforward. For these cases, you may need to adjust the generated file manually.&lt;/p&gt;

&lt;p&gt;The tool handles these limitations gracefully: it does the conversion, preserves as much information as possible, explains what couldn't be directly translated, and provides guidance on how to handle special cases in the uv ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;The tool is designed to get you working quickly. Installation is straightforward using uv itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install as a global uv tool&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/danilop/requirements-to-uv.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then navigate to any Python project with a &lt;code&gt;requirements.txt&lt;/code&gt; file and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;req2uv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool automatically detects your project structure, finds all requirements files, gathers metadata, and generates a complete &lt;code&gt;pyproject.toml&lt;/code&gt;. It runs in interactive mode by default, showing you what it found and asking for confirmation before writing files. Once the conversion is complete, you can immediately use uv to install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For CI/CD pipelines or scripts, you can use non-interactive mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;req2uv &lt;span class="nt"&gt;--non-interactive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also preview what the tool would do without making changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;req2uv &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Usage Patterns
&lt;/h2&gt;

&lt;p&gt;After using this tool across various projects, certain patterns have proven particularly valuable. When modernizing existing projects, I start with a dry run to see what the tool will generate. This helps catch any issues before committing to changes. The tool creates a backup of any existing &lt;code&gt;pyproject.toml&lt;/code&gt;, but I also like to commit my current state to git first, just to be safe.&lt;/p&gt;

&lt;p&gt;For repositories with multiple requirements files, the automatic detection and categorization saves significant time. The tool recognizes common patterns like &lt;code&gt;requirements-dev.txt&lt;/code&gt;, &lt;code&gt;requirements-test.txt&lt;/code&gt;, and &lt;code&gt;requirements-docs.txt&lt;/code&gt;, and organizes them into appropriate dependency groups. This structure aligns well with uv's dependency group feature, making it easy to install just the dependencies you need for a particular task.&lt;/p&gt;

&lt;p&gt;When cloning open-source projects that still use the older format, I typically run &lt;code&gt;req2uv&lt;/code&gt; as my first step after cloning. This lets me work with the project using my preferred tools without needing to maintain both formats or deal with inconsistencies between pip and uv behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Simple and Focused
&lt;/h2&gt;

&lt;p&gt;The tool is built with a clear focus on doing one thing well. It uses the &lt;a href="https://github.com/pypa/packaging" rel="noopener noreferrer"&gt;packaging&lt;/a&gt; library for parsing requirements and handling version specifiers according to Python packaging standards. &lt;a href="https://click.palletsprojects.com/" rel="noopener noreferrer"&gt;Click&lt;/a&gt; provides the command-line interface with automatic help generation and parameter validation. &lt;a href="https://github.com/Textualize/rich" rel="noopener noreferrer"&gt;Rich&lt;/a&gt; handles terminal formatting for readable output and progress indication. And &lt;a href="https://github.com/tmbo/questionary" rel="noopener noreferrer"&gt;questionary&lt;/a&gt; powers the interactive prompts when running in interactive mode.&lt;/p&gt;

&lt;p&gt;The core logic separates concerns cleanly: a parser module handles &lt;code&gt;requirements.txt&lt;/code&gt; parsing and normalization, a detector module gathers project metadata from various sources, and a generator module creates valid TOML structures. This modular design makes the tool maintainable and makes it easier to extend with new features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices for Conversion
&lt;/h2&gt;

&lt;p&gt;Through working with various project structures, a few practices have emerged as particularly helpful. Before running the conversion, it's worth reviewing your requirements files to ensure they're current and removing any commented-out dependencies that you no longer need. This gives you a clean starting point.&lt;/p&gt;

&lt;p&gt;After conversion, I recommend reviewing the generated &lt;code&gt;pyproject.toml&lt;/code&gt;, especially the &lt;code&gt;[tool.uv.sources]&lt;/code&gt; section if you have git or path dependencies. While the tool handles most cases automatically, some scenarios—like private git repositories or unusual URL patterns—might need manual adjustment.&lt;/p&gt;

&lt;p&gt;It's also helpful to test the converted dependencies immediately by running &lt;code&gt;uv sync&lt;/code&gt; and verifying that your application still works as expected. This catches any edge cases early while the conversion process is still fresh in your mind.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Converting Python projects from &lt;code&gt;requirements.txt&lt;/code&gt; to modern &lt;code&gt;pyproject.toml&lt;/code&gt; with uv support doesn't have to be a manual chore. The requirements-to-uv tool handles the complexity of parsing various dependency formats, intelligently detects project metadata, and generates complete, valid project files.&lt;/p&gt;

&lt;p&gt;Whether you're modernizing your own projects or quickly setting up repositories you've cloned, this tool helps you move to a faster, more standardized Python development workflow. The complete code is available on &lt;a href="https://github.com/danilop/requirements-to-uv" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, and contributions are welcome.&lt;/p&gt;

&lt;p&gt;Give it a try the next time you encounter a Python project still using &lt;code&gt;requirements.txt&lt;/code&gt;—you might be surprised how much smoother your workflow becomes with modern Python tooling.&lt;/p&gt;

</description>
      <category>python</category>
      <category>uv</category>
      <category>programming</category>
    </item>
    <item>
      <title>Never Forget a Thing: Building AI Agents with Hybrid Memory Using Strands Agents</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Fri, 31 Oct 2025 10:49:20 +0000</pubDate>
      <link>https://dev.to/aws/never-forget-a-thing-building-ai-agents-with-hybrid-memory-using-strands-agents-2g66</link>
      <guid>https://dev.to/aws/never-forget-a-thing-building-ai-agents-with-hybrid-memory-using-strands-agents-2g66</guid>
      <description>&lt;p&gt;When using (and building) AI agents, I kept running into the same frustrating problem: as conversations grew longer, my agents would either lose important details from earlier in the conversation or hit context limits and crash. The standard solution—a sort of aggressive summarization—worked for maintaining context flow, but it created a new problem: those summaries were lossy. Important details, specific numbers, exact quotes, and nuanced context could vanish into their generalizations.&lt;/p&gt;

&lt;p&gt;I needed something better: a memory system that could maintain conversation flow through intelligent summarization while preserving the ability to retrieve exact historical messages when needed. After researching the broad topic of context engineering, I built a proof-of-concept &lt;a href="https://github.com/danilop/strands-agents-semantic-summarizing-conversation-manager" rel="noopener noreferrer"&gt;Semantic Summarizing Conversation Manager&lt;/a&gt;: a hybrid memory system for &lt;a href="https://strandsagents.com/" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; that combines the efficiency of summarization with the precision of semantic search.&lt;/p&gt;

&lt;p&gt;In this post, I'll show you how this system improves on the memory problem, walk you through its architecture, and demonstrate how it can upgrade your AI agents from forgetful assistants into (more) reliable partners with perfect recall.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Memory Problem: Summarization vs Recall
&lt;/h1&gt;

&lt;p&gt;Before diving into the solution, let's understand why this problem exists. AI agents typically manage conversation context in one of three ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep Everything&lt;/strong&gt;: Store all messages in the active context. This works great for short conversations but inevitably hits model context limits. When you reach that limit, the agent has to do something to reduce the size of its context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summarization&lt;/strong&gt;: When context gets full, summarize older messages into a compressed form. This process, also known as compacting, maintains conversation flow and prevents context overflow, but summaries are inherently lossy. Ask "What was the exact number I mentioned earlier?" and the agent might recall "you discussed some statistics" but not the actual value. A possible mitigation is to apply different hierarchical levels of summarizations that are retrieved based on the specific requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sliding Window&lt;/strong&gt;: Keep only the N most recent messages, discarding older ones entirely. Simple and memory-efficient, but loses all historical context beyond the window. The agent literally forgets everything from earlier in the conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Proactive Memory Curation&lt;/strong&gt;: A variation of automatic summarization is to actively control the process. For example, to trigger summarization not when the context is full but when something happens in the agent lifecycle, such as the completion of a specific task. This works because summarization is applied to a bounded context (the task) that can reduce the amount of information needed about the specific task internals by the rest of the tasks.&lt;/p&gt;

&lt;p&gt;Each approach has fundamental trade-offs. You can have context efficiency or perfect recall, but not both.&lt;/p&gt;

&lt;h1&gt;
  
  
  Hybrid Memory: The Best of Both Worlds
&lt;/h1&gt;

&lt;p&gt;The Semantic Summarizing Conversation Manager takes a different approach: it combines summarization for active context management with semantic search for precise historical recall. Here's how it works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Normal Operation&lt;/strong&gt;: Messages flow through the conversation as usual. The agent sees the full context and responds naturally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qs5sm9metggy1lz40ab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0qs5sm9metggy1lz40ab.png" alt="Before summarization" width="800" height="179"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Overflow&lt;/strong&gt;: When the context gets too long, the system performs two parallel operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a summary of older messages for the active conversation, maintaining flow&lt;/li&gt;
&lt;li&gt;Stores the exact messages in memory using &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/state/#agent-state" rel="noopener noreferrer"&gt;Strands Agents' key-value state&lt;/a&gt; for later retrieval&lt;/li&gt;
&lt;li&gt;Indexes those messages in a semantic (vector based) search engine for intelligent lookup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Query Time&lt;/strong&gt;: When new messages arrive, a &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/agents/hooks/" rel="noopener noreferrer"&gt;Strands Agents hook&lt;/a&gt; automatically searches for relevant historical messages, includes surrounding context for better understanding, and prepends this context to the user's message if relevant matches are found.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy6tl9spxm19mhzbkkkb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy6tl9spxm19mhzbkkkb.png" alt="After summarization" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent gets three types of memory working together: the active conversation with summaries (for context flow), the archived exact messages (for precision), and the semantic index (for intelligent retrieval). This hybrid approach means the agent never loses information, but also never overwhelms the model with excessive context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvmqf1yv86dxxc77i4ez.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvvmqf1yv86dxxc77i4ez.png" alt="Architecture" width="800" height="732"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Why This Architecture Makes Sense
&lt;/h1&gt;

&lt;p&gt;Here's a crucial insight that makes this hybrid approach viable: the amount of RAM available to an agent is typically orders of magnitude larger than the model's context window.&lt;/p&gt;

&lt;p&gt;Consider a typical deployment: a modern language model might have a context window of up to 1 million tokens (roughly 750,000 words or about 4MB of text). Meanwhile, even a small &lt;a href="https://aws.amazon.com/pm/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function has at least 128MB of memory, and container deployments often have several gigabytes. That's a difference of three to four orders of magnitude—1,000x to 10,000x more storage capacity than context capacity.&lt;/p&gt;

&lt;p&gt;This disparity is fundamental to how language models work. Context windows are constrained by the quadratic attention mechanism—doubling the context quadruples the computation. But RAM? RAM is relatively cheap and abundant in comparison. You can store thousands of conversation messages and tool results in a few megabytes, along with their embeddings for semantic search, and still use less than 1% of available memory.&lt;/p&gt;

&lt;p&gt;The implication: you don't need to delete information just because it doesn't fit in the model's context window. Store it, index it, and retrieve it intelligently when needed. The bottleneck isn't storage—it's attention. This hybrid architecture respects that constraint while leveraging the abundant storage available to modern agents.&lt;/p&gt;

&lt;p&gt;This is why the semantic conversation manager can confidently store exact messages indefinitely (with optional limits for safety) while keeping only the most relevant information in the active context. We're playing to the strengths of the underlying hardware: use the model's limited context for reasoning and generation, use RAM for comprehensive storage and retrieval.&lt;/p&gt;

&lt;h1&gt;
  
  
  Architecture: Three Components Working in Harmony
&lt;/h1&gt;

&lt;p&gt;The system consists of three main components that integrate seamlessly with Strands Agents:&lt;/p&gt;

&lt;h2&gt;
  
  
  Component 1: SemanticSummarizingConversationManager
&lt;/h2&gt;

&lt;p&gt;This is the core conversation manager that extends Strands' base conversation management with semantic capabilities. It maintains the active conversation window, triggers summarization when context overflows, stores exact messages with semantic indexing, manages memory limits by message count or total memory usage, and provides real-time memory usage statistics.&lt;/p&gt;

&lt;p&gt;The key innovation here is that summarization and archival happen atomically. When messages get summarized, they're simultaneously preserved and indexed, ensuring nothing is ever lost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Component 2: SemanticMemoryHook
&lt;/h2&gt;

&lt;p&gt;This hook integrates with Strands' lifecycle system to provide automatic context enrichment. It subscribes to the MessageAddedEvent, searches semantic memory when new messages arrive, retrieves relevant historical messages with surrounding context, and prepends the enriched context to user messages naturally.&lt;/p&gt;

&lt;p&gt;The hook uses Strands' elegant event system, keeping the memory logic completely separate from your agent's main code. Your agent doesn't need to know anything about memory management—it just works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Component 3: SemanticSearch Engine
&lt;/h2&gt;

&lt;p&gt;The search engine powers intelligent retrieval using sentence transformers for initial embedding, cross-encoder reranking for precision, configurable relevance thresholds, and persistent index storage.&lt;/p&gt;

&lt;p&gt;I chose a two-stage retrieval approach because it provides the best balance of speed and accuracy. The sentence transformer quickly narrows down candidates, then the cross-encoder reranks for precision. This combination ensures the agent finds truly relevant messages, not just keyword matches.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setting Up Hybrid Memory
&lt;/h1&gt;

&lt;p&gt;Let's build an agent with semantic memory. This implementation is a &lt;strong&gt;prototype&lt;/strong&gt; designed to demonstrate the hybrid memory concept. While functional and tested, it's intended for experimentation and learning rather than production deployment without further development and testing. The setup is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_semantic_memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;SemanticSummarizingConversationManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;SemanticMemoryHook&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;conv_manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticSummarizingConversationManager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L12-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;semantic_memory_hook&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticMemoryHook&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-lite-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conv_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;semantic_memory_hook&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! Your agent now has hybrid memory. Use it normally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Store information
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Our shared number is 42. This is confidential, don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t include it in any summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ... many messages later, after summarization ...
&lt;/span&gt;
&lt;span class="c1"&gt;# Retrieve exact information
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What was our shared number?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# The hook finds the archived message and includes it automatically
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Understanding the Parameters
&lt;/h1&gt;

&lt;p&gt;The configuration parameters give you fine-grained control over memory behavior:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;summary_ratio&lt;/strong&gt; (0.1-0.8): Determines what percentage of messages to summarize when context overflows. Lower values create shorter summaries but trigger overflow more frequently. I find 0.7 (70%) provides a good balance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;preserve_recent_messages&lt;/strong&gt;: Messages that never get summarized. These stay in the active conversation no matter what. I typically use 10-20 to maintain recent context flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;message_context_radius&lt;/strong&gt;: When retrieving a relevant message, how many surrounding messages to include. A radius of 2 means you get 2 messages before and 2 after the match, providing better context. This prevents retrieving messages in isolation where the surrounding conversation provides crucial meaning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;semantic_search_top_k&lt;/strong&gt;: Number of relevant messages to retrieve. More isn't always better—too many matches can overwhelm the context. I start with 3 and adjust based on testing and evaluations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;semantic_search_min_score&lt;/strong&gt;: The cross-encoder relevance threshold (default: -2.0). Higher values are more selective, lower values cast a wider net. The default provides balanced precision and recall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;max_num_archived_messages&lt;/strong&gt;: Optional limit on stored messages. When exceeded, oldest messages are removed. Useful for long-running agents to prevent unbounded growth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;max_memory_archived_messages&lt;/strong&gt;: Optional limit on total memory usage (in bytes). Includes both message content and embeddings. When exceeded, oldest archived messages are removed to stay within budget.&lt;/p&gt;

&lt;p&gt;These last two parameters are particularly important for production deployments where long term memory constraints matter. You can use either, both, or neither depending on your needs.&lt;/p&gt;

&lt;h1&gt;
  
  
  How It Works: A Complete Example
&lt;/h1&gt;

&lt;p&gt;Let me show you the system in action. The included demo creates an agent, stores a secret that shouldn't appear in summaries, builds conversation history, triggers summarization, and then demonstrates semantic retrieval.&lt;/p&gt;

&lt;p&gt;When you run the demo with &lt;code&gt;uv run main.py&lt;/code&gt;, you'll see the complete flow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial Conversation (20 messages)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ 0] user: Our shared number is 700. This is confidential - don't include it in any summary...
[ 1] assistant: Understood. I'll keep our shared number confidential...
[ 2] user: Tell me about recursive functions and data structures.
...
[19] assistant: Recursion is when a function calls itself...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After Summarization (9 messages)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ 0] user: ## Conversation Summary
* Topic 1: Explanation of recursion
* Topic 2: Arrays
* Topic 3: Linked Lists
[Note: The shared number is NOT in the summary ✅]

[ 1] user: What are sorting algorithms?
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that the summary preserves the conversation flow (discussing recursion and data structures) while excluding the confidential information. The agent can continue having coherent conversations about algorithms without the secret cluttering the context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic Retrieval Finds Everything&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔍 Query: 'What was our shared secret number?'
Search completed in 66.7ms (reranked from 9 candidates)
✅ Found 4 relevant messages in semantic memory

• Secret '700' retrievable: ✅ YES
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The semantic search quickly finds the archived message, even though it's not in the active conversation. The system automatically enriches the query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Based on our previous conversation, these earlier exchanges may be relevant:

---Previous Context---
[Message 0, user]: Our shared number is 700. This is confidential – don't include it in any summary...
[Message 1, assistant]: Understood. I'll keep our shared number confidential...
---End Previous Context---

Current question: What was our shared number?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent sees both the original messages (with surrounding context from the radius parameter) and the current query. This natural enrichment happens automatically. The agent code doesn't change at all.&lt;/p&gt;

&lt;h1&gt;
  
  
  Memory Usage Monitoring
&lt;/h1&gt;

&lt;p&gt;The conversation manager includes built-in memory monitoring, essential for production deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get detailed statistics
&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_usage_stats&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Messages stored: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message_count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total memory: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Message memory: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message_memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Embedding memory: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding_memory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get human-readable summary
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conversation_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_usage_summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This visibility is crucial when tuning your memory limits. You can see exactly how much memory your agents are using and adjust the configuration accordingly.&lt;/p&gt;

&lt;h1&gt;
  
  
  Deployment Considerations
&lt;/h1&gt;

&lt;p&gt;Before considering deployment of this prototype, several important factors need careful evaluation and likely additional development:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Limits&lt;/strong&gt;: Set appropriate limits based on your deployment environment. A Lambda function with 3GB memory needs tighter constraints than a long-running container. Use both message count and memory size limits to prevent unbounded growth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embedding Model&lt;/strong&gt;: The system uses sentence transformers by default, which runs locally. For production, consider your latency and throughput requirements. Local models add no API costs but use CPU resources. You might want to experiment with different embedding models for your specific use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Index Persistence&lt;/strong&gt;: The semantic index persists to disk, enabling warm starts. This means restarted agents can immediately search historical messages without rebuilding the index. Make sure your deployment environment has writable storage (or modify the code to use a different persistence backend).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Radius Tuning&lt;/strong&gt;: Start with a radius of 2 and adjust based on testing. Larger radii provide more context but use more tokens. Monitor your context usage to find the sweet spot for your domain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search Threshold&lt;/strong&gt;: The default min_score of -2.0 works well for general use, but you might need to tune it. If you're getting too many irrelevant matches, increase it. If you're missing relevant context, decrease it. Log the scores during development to understand what works for your data.&lt;/p&gt;

&lt;h1&gt;
  
  
  Intelligent Overlap Handling
&lt;/h1&gt;

&lt;p&gt;The system automatically merges overlapping message ranges. If semantic search finds messages 5-7 and messages 6-9 as relevant, it merges them into a single range 5-9 rather than duplicating messages 6 and 7. This prevents token waste and maintains a cleaner context presentation.&lt;/p&gt;

&lt;p&gt;This improves context quality because the agent sees a coherent narrative flow rather than confusing duplicated messages.&lt;/p&gt;

&lt;h1&gt;
  
  
  Real-World Use Cases
&lt;/h1&gt;

&lt;p&gt;This hybrid memory architecture excels in several scenarios:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer Support&lt;/strong&gt;: Keep the last few exchanges in active context for natural flow, but retrieve exact past conversations when a customer references an earlier issue or order number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal Assistants&lt;/strong&gt;: Maintain recent context for ongoing tasks while being able to recall specific details from weeks or months ago. "What was that restaurant you recommended last month?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Documentation Bots&lt;/strong&gt;: Summarize long technical discussions while preserving the ability to retrieve exact code snippets, error messages, or configuration values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Educational Tutors&lt;/strong&gt;: Remember the student's learning journey, including specific questions they asked and concepts they struggled with, even across multiple sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Analysis Agents&lt;/strong&gt;: Maintain conversation flow while being able to recall exact numbers, queries, or insights from earlier in a long analysis session.&lt;/p&gt;

&lt;p&gt;The common thread: any agent that needs both conversational coherence and precise recall benefits from this architecture.&lt;/p&gt;

&lt;h1&gt;
  
  
  What Makes This Different
&lt;/h1&gt;

&lt;p&gt;You might be wondering how this compares to other memory solutions. Several approaches exist in the agent ecosystem, but they typically choose one strategy:&lt;/p&gt;

&lt;p&gt;Some frameworks use hierarchical summarization, creating summaries of summaries. This manages context well but makes precise recall even harder—information gets compressed multiple times.&lt;/p&gt;

&lt;p&gt;Some implement retrieval-augmented generation (RAG) where the agent explicitly calls a memory retrieval tool. This gives the agent control but requires it to decide when to search, adding cognitive overhead.&lt;/p&gt;

&lt;p&gt;The Semantic Summarizing Conversation Manager combines automatic summarization for context flow with automatic semantic retrieval for precision. The agent doesn't need to manage memory—it just works. The hook system in Strands makes this possible through its elegant event architecture.&lt;/p&gt;

&lt;h1&gt;
  
  
  What's Next
&lt;/h1&gt;

&lt;p&gt;This hybrid memory system balances efficiency with precision and automatic behavior with configurability. As a prototype, this system demonstrates the core concepts but would benefit from additional hardening, testing, and optimization before production use.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/danilop/strands-agents-semantic-summarizing-conversation-manager" rel="noopener noreferrer"&gt;complete prototype code is available on GitHub&lt;/a&gt;. I've included comprehensive documentation, the working demo, and modular components you can adapt for your needs.&lt;/p&gt;

&lt;p&gt;I'm particularly interested in feedback on parameter tuning for different domains. What works well for customer support might not work for technical documentation. If you use this system, I'd love to hear about your configuration choices and what you learned.&lt;/p&gt;

&lt;p&gt;Ready to improve your agents memory? Clone the repo, run the demo, and see hybrid memory in action. Your agents (and your users) will thank you for it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Visualizing AI Agent Memory: Building a Web Browser for Amazon Bedrock AgentCore Memory</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Fri, 19 Sep 2025 11:20:07 +0000</pubDate>
      <link>https://dev.to/aws/visualizing-ai-agent-memory-building-a-web-browser-for-amazon-bedrock-agentcore-memory-3571</link>
      <guid>https://dev.to/aws/visualizing-ai-agent-memory-building-a-web-browser-for-amazon-bedrock-agentcore-memory-3571</guid>
      <description>&lt;p&gt;When building and testing multiple AI agent frameworks with &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;, I realized I needed a tool to visualize and explore what my agents were actually remembering. AgentCore Memory provides powerful capabilities for managing both short-term conversation context and long-term knowledge extraction, but debugging memory patterns meant diving into AWS CLI commands or writing custom scripts just to see what was stored. I needed a way to quickly browse, search, and understand the memory patterns my agents were creating.&lt;/p&gt;

&lt;p&gt;That's why I built &lt;a href="https://github.com/danilop/agentcore-memory-browser" rel="noopener noreferrer"&gt;AgentCore Memory Browser&lt;/a&gt;: a web interface that makes it simple to explore and interact with Amazon Bedrock AgentCore Memory resources. Whether you're debugging an agent's memory extraction or simply curious about what your agents are learning over time, this tool provides the visibility you need.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/0tDpugivB4U"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;In this post, I'll walk you through the AgentCore Memory Browser's capabilities, show you how to set it up in minutes, and demonstrate how it can accelerate your agent development workflow. This tool complements the multi-framework journey I've been documenting in my &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-a-multi-framework-journey-with-amazon-bedrock-agentcore-p32"&gt;main AgentCore blog series&lt;/a&gt;, providing essential visibility for any agent implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Build a Memory Browser?
&lt;/h2&gt;

&lt;p&gt;Working with AI agents in production requires understanding not just what they say, but what they remember. &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory.html" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore Memory&lt;/a&gt; provides sophisticated memory management with multiple strategies for extracting and storing different types of information. AgentCore Memory can capture user preferences and settings, store factual information extracted from conversations, create condensed summaries of sessions, and maintain the raw conversation history for context.&lt;/p&gt;

&lt;p&gt;When an agent isn't behaving as expected, or when you want to understand its memory patterns, you need visibility into these memory stores. The AWS CLI provides the raw capability, but switching between terminal commands while developing breaks your flow. I needed something more intuitive—a tool that could show me at a glance what each memory strategy was storing, let me search through records, and help me understand how my agents were using memory across different sessions and actors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features That Accelerate Development
&lt;/h2&gt;

&lt;p&gt;The AgentCore Memory Browser provides real-time exploration of all your AgentCore Memory resources with live data pulled directly from both control plane and data plane APIs. You can see memory status, configurations, and strategies at a glance.&lt;/p&gt;

&lt;p&gt;Each memory strategy gets its own dedicated interface with operations tailored to its purpose. Whether you're working with user preferences, semantic facts, or session summaries, the browser adapts to show relevant operations and namespace patterns. AgentCore Memory uses namespace templates with placeholders, and when a strategy defines a namespace that contains a &lt;code&gt;{memoryStrategyId}&lt;/code&gt;, the browser automatically fills in the strategy ID portion while keeping the field editable so that you can substitute the actor and session values. This makes it easy to explore specific user or session data without having to type the full namespace path each time.&lt;/p&gt;

&lt;p&gt;The browser provides three core operations for each strategy. You can list events to view the sequence of events for specific sessions and actors, helping you understand the temporal flow of your agent's interactions. You can browse all memory records in a namespace with pagination support for large datasets. And you can retrieve memory using natural language queries, taking advantage of AgentCore's semantic search capabilities.&lt;/p&gt;

&lt;p&gt;The developer-friendly UI includes quick copy buttons for Memory IDs, ARNs, and namespace values, saving you from manual selection and copying. The auto-expanding JSON viewer with syntax highlighting makes it easy to inspect complex memory structures. The browser remembers your namespace edits during a session, so you don't have to re-enter actor and session IDs repeatedly. And all user content is HTML-escaped to prevent injection attacks, ensuring security even when browsing untrusted memory content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation in Under a Minute
&lt;/h2&gt;

&lt;p&gt;To get started quickly, I've packaged the AgentCore Memory Browser as a Python tool that can be installed globally using &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, the fast Python package manager.&lt;/p&gt;

&lt;p&gt;Before installation, ensure you have Python 3.13 or higher and AWS CLI configured with appropriate credentials. You'll need AWS IAM permissions for &lt;code&gt;bedrock-agentcore-control:ListMemories&lt;/code&gt; and &lt;code&gt;GetMemory&lt;/code&gt; operations, as well as &lt;code&gt;bedrock-agentcore:ListEvents&lt;/code&gt;, &lt;code&gt;ListMemoryRecords&lt;/code&gt;, and &lt;code&gt;RetrieveMemoryRecords&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can install directly from GitHub with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/danilop/agentcore-memory-browser.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run it from anywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore-memory-browser
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application automatically opens in your default browser at &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt; (you can pass a different port on the command line).&lt;/p&gt;

&lt;p&gt;If you want to modify the tool or contribute to development, you can clone the repository, install dependencies with &lt;code&gt;uv sync&lt;/code&gt;, and run the application with &lt;code&gt;uv run agentcore-memory-browser&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture: Clean Separation of Concerns
&lt;/h2&gt;

&lt;p&gt;The AgentCore Memory Browser follows a clean, modular architecture. The backend is built with FastAPI, providing a modern, async-capable web framework. It uses two AWS service clients: the AgentCore control plane to list and describe memory resources, and the data plane to perform operations like listing events, browsing records, and executing semantic searches.&lt;/p&gt;

&lt;p&gt;The frontend uses Bootstrap for responsive design and vanilla JavaScript for interactivity—no complex build process required. The interface is organized with a sidebar for memory selection with metadata preview, main content with a tabbed interface for each memory strategy, operation panels with dedicated forms for each memory operation, and a results display with a JSON tree viewer with syntax highlighting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Usage Patterns
&lt;/h2&gt;

&lt;p&gt;After using the Memory Browser while developing agents, certain patterns have proven most valuable in my workflow to debug memory extraction as events are processed by strategies, understand how an agent's knowledge evolves over time, and optimizing semantic emory searches. &lt;/p&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;After extensive use, certain practices have emerged as particularly helpful. First, configure your AWS environment properly by setting your default AWS Region and verifying credentials with &lt;code&gt;aws sts get-caller-identity&lt;/code&gt;. This ensures the browser can connect to your AgentCore Memory resources without issues.&lt;/p&gt;

&lt;p&gt;Also, making good use of the copy buttons saves time when you need to reference memory IDs or ARNs in your code.&lt;/p&gt;

&lt;p&gt;What makes AgentCore Memory particularly useful is how it simplifies handling both short-term and long-term memory for AI agents. Short-term memory captures the immediate context of conversations, while long-term memory extracts and preserves important facts, preferences, and patterns that persist across sessions. The Memory Browser gives you a window into both, helping you understand how your agents build knowledge over time and how they use that knowledge to provide more personalized and contextually aware responses. This visibility helps building agents that can maintain coherent and efficient results while learning and adapting from their interactions.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Token Counting Meets Amazon Bedrock</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Tue, 16 Sep 2025 16:30:07 +0000</pubDate>
      <link>https://dev.to/aws/token-counting-meets-amazon-bedrock-4dk5</link>
      <guid>https://dev.to/aws/token-counting-meets-amazon-bedrock-4dk5</guid>
      <description>&lt;p&gt;When working with large language models through &lt;a href="https://aws.amazon.com/bedrock" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt;, understanding token consumption can help managing costs and staying within model limits. While the Bedrock console provides token counts after each API call, developers need a way to measure tokens before sending requests, especially when building applications that process large volumes of text or require precise truncation.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock offers a &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/count-tokens.html" rel="noopener noreferrer"&gt;CountTokens API&lt;/a&gt; that provides exact token measurements for the supported models, currently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic Claude 4 Sonnet&lt;/li&gt;
&lt;li&gt;Anthropic Claude 4 Opus&lt;/li&gt;
&lt;li&gt;Anthropic Claude 3.7 Sonnet&lt;/li&gt;
&lt;li&gt;Anthropic Claude 3.5 Sonnet&lt;/li&gt;
&lt;li&gt;Anthropic Claude 3.5 Haiku&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, integrating this API into development workflows requires dealing with the correct syntax and implementing efficient algorithms if truncation is needed. This is where &lt;a href="https://github.com/danilop/ttok4bedrock" rel="noopener noreferrer"&gt;ttok4bedrock&lt;/a&gt; comes in—a command-line tool and Python library that makes token counting as simple as it should be.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Count tokens (default: Claude Sonnet 4)&lt;/span&gt;
ttok4bedrock &lt;span class="s2"&gt;"Hello, world!"&lt;/span&gt;
&lt;span class="c"&gt;# Output: 11&lt;/span&gt;

&lt;span class="c"&gt;# Count from stdin&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Count these tokens"&lt;/span&gt; | ttok4bedrock
&lt;span class="nb"&gt;cat &lt;/span&gt;document.txt | ttok4bedrock

&lt;span class="c"&gt;# Truncate to N tokens&lt;/span&gt;
ttok4bedrock &lt;span class="nt"&gt;-t&lt;/span&gt; 100 &lt;span class="s2"&gt;"Very long text..."&lt;/span&gt;
&lt;span class="nb"&gt;cat &lt;/span&gt;large.txt | ttok4bedrock &lt;span class="nt"&gt;-t&lt;/span&gt; 100 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; truncated.txt

&lt;span class="c"&gt;# Use specific Bedrock model (full model ID)&lt;/span&gt;
ttok4bedrock &lt;span class="nt"&gt;-m&lt;/span&gt; anthropic.claude-3-5-sonnet-20241022-v2:0 &lt;span class="s2"&gt;"Text"&lt;/span&gt;
ttok4bedrock &lt;span class="nt"&gt;-m&lt;/span&gt; anthropic.claude-3-7-sonnet-20250219-v1:0 &lt;span class="s2"&gt;"Text"&lt;/span&gt;

&lt;span class="c"&gt;# Specify AWS region (uses default if not specified)&lt;/span&gt;
ttok4bedrock &lt;span class="nt"&gt;--aws-region&lt;/span&gt; us-west-2 &lt;span class="s2"&gt;"Text"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Standing on the Shoulders of Giants
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://x.com/simonw" rel="noopener noreferrer"&gt;Simon Willison&lt;/a&gt;'s &lt;a href="https://github.com/simonw/ttok" rel="noopener noreferrer"&gt;ttok&lt;/a&gt; has become a standard tool for token counting with OpenAI models, valued for its simplicity and versatility. Rather than creating something entirely new, I built &lt;code&gt;ttok4bedrock&lt;/code&gt; as a drop-in replacement that maintains complete compatibility with &lt;code&gt;ttok&lt;/code&gt;'s interface while leveraging Bedrock's native CountTokens API.&lt;/p&gt;

&lt;p&gt;The goal was straightforward: preserve the developer experience that made &lt;code&gt;ttok&lt;/code&gt; successful while adapting to Bedrock's requirements. This means you can switch from &lt;code&gt;ttok "Count my tokens"&lt;/code&gt; to &lt;code&gt;ttok4bedrock "Count my tokens"&lt;/code&gt; without changing your scripts or learning new commands. The tool automatically handles AWS authentication using the standard &lt;code&gt;boto3&lt;/code&gt; credential chain and can work with any AWS Region.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the Truncation Challenge
&lt;/h2&gt;

&lt;p&gt;One of the most requested features in token counting tools is intelligent truncation—cutting text to fit within a specific token limit. This is not straightforward if you can only count tokens.&lt;/p&gt;

&lt;p&gt;The truncation algorithm I implemented uses an adaptive approach that minimizes API calls while achieving exact results. It begins by analyzing text characteristics such as punctuation density and word length to estimate the character-to-token ratio. Through iterative refinement using linear interpolation, it finds the precise character boundary where the token count meets your target. The algorithm typically converges in 3-5 API calls for most texts, with built-in caching to eliminate redundant API requests.&lt;/p&gt;

&lt;p&gt;For developers, this means you can pipe any text through the tool with a token limit and get perfectly truncated output: &lt;code&gt;cat large_document.txt | ttok4bedrock -t 1000 &amp;gt; truncated.txt&lt;/code&gt;. The truncation is exact, not approximate, ensuring you maximize the content within your token budget.&lt;/p&gt;

&lt;p&gt;The tool includes self-imposed limits to prevent runaway API usage, capping truncation attempts at 20 API calls. In practice, this limit is rarely reached, but it provides a safety net against unexpected edge cases or malformed input.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling AWS Integration Properly
&lt;/h2&gt;

&lt;p&gt;Working with AWS services requires attention to authentication and configuration patterns. The tool respects the standard AWS credential chain, working seamlessly whether you're using environment variables, AWS profiles, IAM roles on EC2, or any other standard authentication method. Region selection follows the same precedence rules as other AWS tools, checking command-line arguments, environment variables, and configuration files in that order.&lt;/p&gt;

&lt;p&gt;The tool requires minimal IAM permissions—just &lt;code&gt;bedrock:CountTokens&lt;/code&gt; on the foundation model resources. This follows the principle of least privilege while keeping setup simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Details That Matter
&lt;/h2&gt;

&lt;p&gt;An interesting quirk of Amazon Bedrock CountTokens API is that it wraps text in message structures, adding approximately 7 tokens of overhead. This overhead is invisible but affects the count, potentially confusing developers who expect the raw token count of their text. The &lt;code&gt;ttok4bedrock&lt;/code&gt; library automatically detects and subtracts this overhead, providing intuitive results that match what developers expect.&lt;/p&gt;

&lt;p&gt;Model selection is explicit: &lt;code&gt;ttok4bedrock -m anthropic.claude-3-5-sonnet-20241022-v2:0 "Your text here"&lt;/code&gt;. Claude 4 Sonnet is the default if no model is specified.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Patterns for Developers
&lt;/h2&gt;

&lt;p&gt;For Python developers, the library offers the same API as ttok, making migration trivial. Import &lt;code&gt;ttok4bedrock&lt;/code&gt; as &lt;code&gt;ttok,&lt;/code&gt; and your existing code continues to work with Bedrock models for the functionalities that are provided (token count and truncation).&lt;/p&gt;

&lt;p&gt;The CLI tool fits naturally into Unix-style pipelines, accepting input from stdin and outputting to stdout. This design enables powerful compositions with other text processing tools, making it easy to integrate token counting into existing workflows. Whether you're building a document processing pipeline or analyzing prompt efficiency, the tool adapts to your needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Applications
&lt;/h2&gt;

&lt;p&gt;Token counting might seem like a utility concern, but it enables important optimizations in production systems. Accurately measuring tokens before API calls helps with prompt and context engineering, allowing developers to maximize the information within model context windows. For applications that process user-generated content, pre-flight token counting prevents errors and improves user experience by providing immediate feedback about text length.&lt;/p&gt;

&lt;p&gt;The truncation capability is particularly valuable for RAG (Retrieval-Augmented Generation) systems where you need to fit retrieved documents within prompt limits. Instead of crude character-based cutting that might break mid-word or mid-sentence, the tool provides clean truncation at exact token boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Installation is straightforward using &lt;code&gt;uv&lt;/code&gt;, the fast Python package installer. After cloning the repository and running &lt;code&gt;uv sync&lt;/code&gt;, you're ready to count tokens.&lt;/p&gt;

&lt;p&gt;For teams already using &lt;code&gt;ttok&lt;/code&gt; in their workflows, migration is as simple as aliasing &lt;code&gt;ttok4bedrock&lt;/code&gt; to &lt;code&gt;ttok&lt;/code&gt;. The identical command-line interface means existing scripts, documentation, and muscle memory all transfer seamlessly.&lt;/p&gt;

&lt;p&gt;The next time you're working with Claude models on Amazon Bedrock and need to count or truncate tokens, give &lt;code&gt;ttok4bedrock&lt;/code&gt; a try.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>programming</category>
    </item>
    <item>
      <title>Building Production-Ready AI Agents with LangGraph and Amazon Bedrock AgentCore</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Mon, 15 Sep 2025 14:43:37 +0000</pubDate>
      <link>https://dev.to/aws/building-production-ready-ai-agents-with-langgraph-and-amazon-bedrock-agentcore-4h5k</link>
      <guid>https://dev.to/aws/building-production-ready-ai-agents-with-langgraph-and-amazon-bedrock-agentcore-4h5k</guid>
      <description>&lt;p&gt;In this fifth and final deep dive of our &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-a-multi-framework-journey-with-amazon-bedrock-agentcore-p32"&gt;multi-framework series&lt;/a&gt;, I'll show you how to build a production-ready AI agent using &lt;a href="https://www.langchain.com/langgraph" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; and deploy it using &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;agentcore-multi-framework-examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LangGraph takes a different approach to agent workflows. Rather than linear chains of prompts or simple tool loops, LangGraph models agent behavior as a state graph where nodes perform actions and edges define transitions. This graph-based approach enables control flows with cycles, conditional branching, and human-in-the-loop interactions—capabilities that work well with the AgentCore persistent memory system.&lt;/p&gt;

&lt;p&gt;LangGraph makes state management explicit—every node in the graph receives the current state, transforms it, and passes it along. Having explicit state makes debugging easier and helps integrate with AgentCore Memory for state persistence across sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment
&lt;/h2&gt;

&lt;p&gt;I'll begin by navigating to the LangGraph project in our repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-multi-framework-examples/agentcore-lang-graph
uv &lt;span class="nb"&gt;sync
source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project builds on the LangChain ecosystem with these dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;langgraph&lt;/span&gt;                &lt;span class="c1"&gt;# Graph-based agent framework
&lt;/span&gt;&lt;span class="n"&gt;langchain&lt;/span&gt;                &lt;span class="c1"&gt;# Core LangChain library
&lt;/span&gt;&lt;span class="n"&gt;langchain&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;aws&lt;/span&gt;            &lt;span class="c1"&gt;# AWS integrations including Bedrock
&lt;/span&gt;&lt;span class="n"&gt;langchain&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tavily&lt;/span&gt;         &lt;span class="c1"&gt;# Tavily search integration
&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agentcore&lt;/span&gt;        &lt;span class="c1"&gt;# AgentCore SDK
&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agentcore&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;starter&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;toolkit&lt;/span&gt;  &lt;span class="c1"&gt;# Deployment tools
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding the LangGraph State Machine Architecture
&lt;/h2&gt;

&lt;p&gt;LangGraph introduces a fundamentally different way of building agents through its &lt;a href="https://langchain-ai.github.io/langgraph/tutorials/introduction/" rel="noopener noreferrer"&gt;StateGraph&lt;/a&gt; abstraction. Instead of imperatively calling tools or chaining prompts, I define a graph where each node represents a step in the agent's reasoning process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining Agent State
&lt;/h3&gt;

&lt;p&gt;The foundation of any LangGraph agent is its state definition. I use a TypedDict to define what information flows through the graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing_extensions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph.message&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;State&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.message.add_messages" rel="noopener noreferrer"&gt;add_messages&lt;/a&gt; annotation is special—it tells LangGraph to append new messages to the list rather than replacing it. This creates a growing conversation history as the graph executes, maintaining context throughout the workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Graph
&lt;/h3&gt;

&lt;p&gt;The graph construction follows a declarative pattern that clearly shows the agent's decision flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools_condition&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the graph with our State type
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure the LLM with tools
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;init_chat_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-3-7-sonnet-20250219-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock_converse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm_with_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define the chatbot node
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;llm_with_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])]}&lt;/span&gt;

&lt;span class="c1"&gt;# Add nodes to the graph
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatbot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tool_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://python.langchain.com/docs/how_to/chat_models_universal_init/" rel="noopener noreferrer"&gt;init_chat_model&lt;/a&gt; function provides a unified interface for initializing chat models across providers. By specifying &lt;code&gt;model_provider="bedrock_converse"&lt;/code&gt;, I'm using Amazon Bedrock's Converse API, which provides consistent behavior across different foundation models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conditional Edges and Control Flow
&lt;/h3&gt;

&lt;p&gt;The control flow in LangGraph is defined through its edges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Add conditional edge from chatbot
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools_condition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add edge from tools back to chatbot
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add edge from START to chatbot
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compile the graph
&lt;/span&gt;&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.tool_node.tools_condition" rel="noopener noreferrer"&gt;tools_condition&lt;/a&gt; is a pre-built function that examines the chatbot's output. If the model called a tool, it routes to the tools node; otherwise, it ends the conversation. This creates a loop where the agent can make multiple tool calls before providing a final answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Tavily Search
&lt;/h2&gt;

&lt;p&gt;For this implementation, I've integrated &lt;a href="https://www.tavily.com/" rel="noopener noreferrer"&gt;Tavily Search&lt;/a&gt; as the primary tool, demonstrating how LangGraph agents can access real-time information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_tavily&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TavilySearch&lt;/span&gt;

&lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TavilySearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tavily provides an AI-optimized search API that returns clean, relevant snippets rather than full web pages. This reduces the overhead of parsing HTML when agents need current information. The integration is seamless—LangGraph automatically handles tool invocation and result incorporation into the conversation flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  AgentCore Runtime and Memory Integration
&lt;/h2&gt;

&lt;p&gt;The integration with &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html" rel="noopener noreferrer"&gt;AgentCore Runtime&lt;/a&gt; provides the production infrastructure for the LangGraph agent. Let me explain how the entrypoint function processes requests and manages memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Entrypoint Function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestContext&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main entrypoint with AgentCore memory integration.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LangGraph invocation started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html#creating-the-entrypoint" rel="noopener noreferrer"&gt;@entrypoint decorator&lt;/a&gt; marks this function as the handler for incoming requests. The function receives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;payload&lt;/strong&gt;: Contains the request data, including the user's prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;context&lt;/strong&gt;: A &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/context.html" rel="noopener noreferrer"&gt;RequestContext&lt;/a&gt; object providing session management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Memory Enhancement Before Graph Execution
&lt;/h3&gt;

&lt;p&gt;Before executing the graph, I retrieve relevant memories and add them to the input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Extract parameters with context priority for session_id
&lt;/span&gt;    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_ACTOR_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No prompt found in input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Enhance prompt with AgentCore memory context
&lt;/span&gt;    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Current user message: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;

    &lt;span class="c1"&gt;# Create messages for LangGraph
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="p"&gt;}]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The memory context retrieval uses two AgentCore Memory APIs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt; to load conversation history&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/long-term-memory.html" rel="noopener noreferrer"&gt;RetrieveMemories&lt;/a&gt; to search for relevant memories&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Executing the Graph and Storing Results
&lt;/h3&gt;

&lt;p&gt;After the graph processes the enriched input, I store the conversation and return the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Invoke the LangGraph
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# Store conversation in AgentCore memory
&lt;/span&gt;    &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Store original prompt, not enhanced version
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response_message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;store_conversation()&lt;/code&gt; method calls the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory-getting-started.html" rel="noopener noreferrer"&gt;create_event&lt;/a&gt; API, which:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores the raw conversation&lt;/li&gt;
&lt;li&gt;Triggers memory strategies to extract preferences, facts, and summaries&lt;/li&gt;
&lt;li&gt;Makes these insights available for future retrievals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I store the original user prompt, not the enhanced version with memory context. This prevents recursive memory expansion where each retrieval would include previous retrievals, keeping the memory focused on actual conversation content.&lt;/p&gt;

&lt;p&gt;The function returns a dictionary that AgentCore Runtime automatically serializes to JSON for the HTTP response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Graph Patterns
&lt;/h2&gt;

&lt;p&gt;While our implementation uses a simple tool-calling pattern, LangGraph enables more complex workflows that work well with AgentCore Memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Human-in-the-Loop Workflows
&lt;/h3&gt;

&lt;p&gt;LangGraph's &lt;a href="https://langchain-ai.github.io/langgraph/how-tos/human_in_the_loop/breakpoints/" rel="noopener noreferrer"&gt;interrupt and resume&lt;/a&gt; capabilities allow building agents that pause for human input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;

&lt;span class="c1"&gt;# Add checkpointing for state persistence
&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add interrupt before critical decisions
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;human_approval_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;interrupt_before&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;human_approval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combined with AgentCore Memory, this enables workflows where agents remember not just conversations but also approval patterns and human feedback over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Collaboration
&lt;/h3&gt;

&lt;p&gt;LangGraph excels at orchestrating multiple agents working together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Define specialized agents as subgraphs
&lt;/span&gt;&lt;span class="n"&gt;research_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_research_graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;analysis_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_analysis_graph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Compose them in a parent graph
&lt;/span&gt;&lt;span class="n"&gt;parent_builder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;State&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;parent_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;research_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;parent_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Route based on task type
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_to_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;parent_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route_to_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each subgraph can maintain its own memory namespace in AgentCore, enabling specialized agents with distinct knowledge bases while sharing conversation context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parallel Execution
&lt;/h3&gt;

&lt;p&gt;LangGraph supports &lt;a href="https://langchain-ai.github.io/langgraph/how-tos/branching/" rel="noopener noreferrer"&gt;parallel node execution&lt;/a&gt; for concurrent processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="c1"&gt;# Add nodes that can run in parallel
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_web_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;search_memory_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesize_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Both searches run in parallel
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Both complete before synthesis
&lt;/span&gt;&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph_builder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combining web search with memory retrieval is useful, allowing the agent to gather information from multiple sources simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the Agent
&lt;/h2&gt;

&lt;p&gt;The deployment process leverages the same AgentCore Starter Toolkit workflow used throughout this series, with some LangGraph-specific considerations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration and Local Testing
&lt;/h3&gt;

&lt;p&gt;First, I configure the agent for AgentCore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore configure &lt;span class="nt"&gt;-n&lt;/span&gt; langgraphagent &lt;span class="nt"&gt;-e&lt;/span&gt; main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For local testing with Tavily search, I need to provide the API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; &lt;span class="nv"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_TAVILY_API_KEY&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This launches the containerized agent locally. Testing with memory context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "AI multi-agent architectures - Also, what did I say about fruit?"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent performs a web search for current information about AI architectures while also retrieving the stored memory about fruit preferences, demonstrating the power of combining real-time data with persistent context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production Deployment
&lt;/h3&gt;

&lt;p&gt;Deploying to AWS requires passing the Tavily API key as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch &lt;span class="nt"&gt;--env&lt;/span&gt; &lt;span class="nv"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;YOUR_TAVILY_API_KEY&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AgentCore securely stores the API key in AWS Secrets Manager and injects it into the Lambda function environment at runtime. This keeps credentials protected, never exposed in code or configuration files.&lt;/p&gt;

&lt;p&gt;Monitoring the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws logs &lt;span class="nb"&gt;tail&lt;/span&gt; /aws/bedrock-agentcore/runtimes/&amp;lt;AGENT_ID_ENDPOINT_ID&amp;gt; &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The structured logging from both LangGraph and AgentCore makes it easy to trace the graph execution flow and debug any issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Debugging
&lt;/h2&gt;

&lt;p&gt;LangGraph provides excellent visibility into agent execution through its graph structure. I can even visualize the graph running this code locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Generate a Mermaid diagram of the graph
&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_graph&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;draw_mermaid_png&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;graph_diagram.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This visualization helps understand the agent's decision flow and is invaluable for debugging complex workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracing Graph Execution
&lt;/h3&gt;

&lt;p&gt;LangGraph's execution is fully traceable through &lt;a href="https://docs.smith.langchain.com/" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt; or custom callbacks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StdOutCallbackHandler&lt;/span&gt;

&lt;span class="c1"&gt;# Add tracing to graph execution
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;callbacks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;StdOutCallbackHandler&lt;/span&gt;&lt;span class="p"&gt;()]}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combined with CloudWatch Logs and observability from AgentCore, this provides complete observability from infrastructure to application logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleaning Up Resources
&lt;/h2&gt;

&lt;p&gt;To delete the resources created by &lt;code&gt;agentcore launch&lt;/code&gt;, I use the &lt;code&gt;agentcore&lt;/code&gt; command in the Python virtual event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command deletes the AgentCore agent, the ECR images, the CodeBuild project, and the IAM roles used by the agent and by CodeBuild.&lt;/p&gt;

&lt;p&gt;To delete the memory, including all stored events, the strategies, and the memories extracted form the events, I lookup the memory ID in the &lt;code&gt;../config/memory-config.json&lt;/code&gt; file and use the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws bedrock-agentcore-control delete-memory &lt;span class="nt"&gt;--memory-id&lt;/span&gt; &amp;lt;MEMORY_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;Running LangGraph agents in production with AgentCore has taught me several important lessons.&lt;/p&gt;

&lt;h3&gt;
  
  
  State Management Strategies
&lt;/h3&gt;

&lt;p&gt;LangGraph's explicit state management requires careful consideration of what to persist. While the graph maintains state during execution, AgentCore Memory handles cross-session persistence. I've found it best to keep graph state focused on the current task while using AgentCore Memory for long-term context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing Graph-Based to Traditional Approaches
&lt;/h2&gt;

&lt;p&gt;After implementing agents with multiple frameworks, I can see where LangGraph's graph-based approach works best. Traditional agent frameworks often struggle with complex, multi-step workflows that require backtracking or parallel processing. LangGraph makes these patterns natural and explicit.&lt;/p&gt;

&lt;p&gt;The graph structure also makes it easier to implement safety constraints. By controlling edges and adding validation nodes, I can make agents follow approved workflows even when using powerful foundation models. This matters in production environments where predictability is as important as capability.&lt;/p&gt;

&lt;p&gt;However, this power comes with complexity. Simple question-answering agents might be overengineered as graphs. The key is choosing the right tool for the job—LangGraph for complex workflows, simpler frameworks for straightforward tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We've Learned
&lt;/h2&gt;

&lt;p&gt;This LangGraph implementation completes our journey through five different agent frameworks, all unified through Amazon Bedrock AgentCore. The graph-based approach offers unique advantages for complex workflows while maintaining the same production-grade memory and deployment infrastructure we've used throughout the series.&lt;/p&gt;

&lt;p&gt;AgentCore's flexibility enables each framework to use its strengths while providing consistent operational excellence. Whether you're building simple tool-calling agents with Strands, multi-agent systems with CrewAI, type-safe applications with Pydantic AI, data-centric agents with LlamaIndex, or complex workflows with LangGraph, AgentCore handles the production challenges so you can focus on agent logic.&lt;/p&gt;

&lt;p&gt;The shared memory architecture we've used across all frameworks shows the benefits of standardization. By creating a common memory interface, we've enabled true portability—agents built with different frameworks can share memories and even hand off conversations to each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps and Future Directions
&lt;/h2&gt;

&lt;p&gt;While this series focused on Runtime and Memory, AgentCore offers additional services that further enhance production deployments. Future explorations could include using AgentCore Identity for inbound and outbound agent authentication and authorization, implementing AgentCore Gateways to transform existing APIs and AWS Lambda functions into MCP servers, or leveraging AgentCore Monitoring for advanced observability.&lt;/p&gt;

&lt;p&gt;The complete code for all five frameworks is available on &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. I encourage you to explore the implementations, experiment with combining different frameworks, and build your own production-ready agents.&lt;/p&gt;

&lt;p&gt;The AI agent ecosystem is evolving rapidly, with new tools and patterns emerging constantly. What remains constant is the need for production-grade infrastructure that can adapt to these changes. Amazon Bedrock AgentCore provides that foundation, enabling you to experiment with cutting-edge agent technologies while maintaining enterprise-ready deployment and operations.&lt;/p&gt;

&lt;p&gt;Thank you for joining me on this multi-framework journey. Now it's your turn to build something amazing!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Building Production-Ready AI Agents with LlamaIndex and Amazon Bedrock AgentCore</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Mon, 15 Sep 2025 14:37:28 +0000</pubDate>
      <link>https://dev.to/aws/building-production-ready-ai-agents-with-llamaindex-and-amazon-bedrock-agentcore-1fm3</link>
      <guid>https://dev.to/aws/building-production-ready-ai-agents-with-llamaindex-and-amazon-bedrock-agentcore-1fm3</guid>
      <description>&lt;p&gt;In this fourth deep dive of our &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-a-multi-framework-journey-with-amazon-bedrock-agentcore-p32"&gt;multi-framework series&lt;/a&gt;, I'll show you how to build a production-ready AI agent using &lt;a href="https://www.llamaindex.ai/" rel="noopener noreferrer"&gt;LlamaIndex&lt;/a&gt; and deploy it using &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;agentcore-multi-framework-examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;LlamaIndex takes a data-centric approach to agent development. While other frameworks focus primarily on orchestration and tool calling, LlamaIndex specializes in connecting agents with diverse data sources and building RAG (Retrieval-Augmented Generation) pipelines. This is useful when your agent needs to reason over documents, databases, or APIs while maintaining production-grade memory persistence through AgentCore.&lt;/p&gt;

&lt;p&gt;LlamaIndex provides specialized components for every aspect of data processing—from ingestion and chunking to embedding and retrieval. This modular design works well with AgentCore's memory system, allowing me to build agents that process external data while maintaining searchable conversation histories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment
&lt;/h2&gt;

&lt;p&gt;I'll start by navigating to the LlamaIndex project within our multi-framework repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-multi-framework-examples/agentcore-llama-index
uv &lt;span class="nb"&gt;sync
source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project uses several LlamaIndex packages for different capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;core&lt;/span&gt;         &lt;span class="c1"&gt;# Core agent and data framework
&lt;/span&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;llms&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;converse&lt;/span&gt;  &lt;span class="c1"&gt;# Amazon Bedrock LLM integration
&lt;/span&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;     &lt;span class="c1"&gt;# Bedrock embeddings for semantic search
&lt;/span&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;wikipedia&lt;/span&gt;        &lt;span class="c1"&gt;# Wikipedia data source
&lt;/span&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;        &lt;span class="c1"&gt;# Web request capabilities
&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agentcore&lt;/span&gt;                 &lt;span class="c1"&gt;# AgentCore SDK
&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agentcore&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;starter&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;toolkit&lt;/span&gt; &lt;span class="c1"&gt;# Deployment tools
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding the LlamaIndex Agent Architecture
&lt;/h2&gt;

&lt;p&gt;LlamaIndex recently introduced the &lt;a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/" rel="noopener noreferrer"&gt;FunctionAgent&lt;/a&gt;, a streamlined agent implementation that focuses on tool use and conversation flow. Unlike the earlier ReActAgent, FunctionAgent leverages the native function-calling capabilities of modern LLMs, resulting in more reliable and efficient agent behavior.&lt;/p&gt;

&lt;p&gt;The architecture I've built combines three key LlamaIndex concepts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Global Settings Configuration
&lt;/h3&gt;

&lt;p&gt;LlamaIndex uses a &lt;a href="https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/settings/" rel="noopener noreferrer"&gt;Settings&lt;/a&gt; object to configure global defaults for LLMs, embeddings, and processing parameters. This centralized configuration provides consistency across all components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;initialize_llamaindex_settings&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Initialize LlamaIndex global settings.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Configure LLM
&lt;/span&gt;    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockConverse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.amazon.nova-pro-v1:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Configure embeddings
&lt;/span&gt;    &lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.titan-embed-text-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Set global settings
&lt;/span&gt;    &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
    &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embed_model&lt;/span&gt;
    &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;
    &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_overlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.llamaindex.ai/en/stable/examples/llm/bedrock_converse/" rel="noopener noreferrer"&gt;BedrockConverse&lt;/a&gt; integration uses the Amazon Bedrock Converse API, which provides a unified interface across different foundation models. This means I can easily switch between Claude, Llama, or Amazon Nova models without changing my agent code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool Integration
&lt;/h3&gt;

&lt;p&gt;LlamaIndex tools follow a consistent interface through the &lt;a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/" rel="noopener noreferrer"&gt;FunctionTool&lt;/a&gt; abstraction. I've created a comprehensive tool suite that combines custom tools with pre-built integrations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_llamaindex_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FunctionTool&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a list of tools for the LlamaIndex agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Basic function tools
&lt;/span&gt;    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="n"&gt;FunctionTool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_defaults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Perform basic mathematical calculations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;FunctionTool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_defaults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text_analyzer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text_analyzer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze text and provide statistics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Add external tools
&lt;/span&gt;    &lt;span class="n"&gt;wikipedia_spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WikipediaToolSpec&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;wikipedia_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wikipedia_spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_tool_list&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wikipedia_tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/llamahub_tools_guide/" rel="noopener noreferrer"&gt;ToolSpec&lt;/a&gt; pattern allows entire tool suites to be packaged and shared. The Wikipedia and Requests tools come from LlamaHub, the LlamaIndex community tool repository, demonstrating how easy it is to extend agent capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory-Enhanced Context
&lt;/h3&gt;

&lt;p&gt;Integrating AgentCore Memory with LlamaIndex requires dynamically updating the agent's system prompt based on retrieved memories, rather than just appending memories to prompts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Enhance the user input with memory context if available
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Update agent's system prompt if enhanced
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;get_system_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Relevant context from previous interactions:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;agent_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_prompts&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  AgentCore Runtime Integration
&lt;/h2&gt;

&lt;p&gt;The integration with &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html" rel="noopener noreferrer"&gt;AgentCore Runtime&lt;/a&gt; starts with the entrypoint function that receives requests and returns responses. Let me explain how each component works:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Entrypoint Function
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestContext&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;AgentCore entrypoint for LlamaIndex agent invocation.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html#creating-the-entrypoint" rel="noopener noreferrer"&gt;@entrypoint decorator&lt;/a&gt; marks this function as the handler that AgentCore Runtime invokes when your agent receives a request. The function receives two parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;payload&lt;/strong&gt;: A dictionary containing the request data, including the user's prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;context&lt;/strong&gt;: A &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/context.html" rel="noopener noreferrer"&gt;RequestContext&lt;/a&gt; object that provides session information managed by AgentCore&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Processing Requests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract parameters from the request
&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! What can you help me with?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_ACTOR_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processing request for actor_id: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, session_id: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The session_id from the RequestContext is particularly important—AgentCore Runtime automatically manages session isolation at iuinfrastructure level, giving each user their own isolated conversation space that persists across invocations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Enhancement Before Processing
&lt;/h3&gt;

&lt;p&gt;Before the agent processes the request, I retrieve relevant memories and add them to the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Enhance prompt with AgentCore memory context
&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Update agent's system prompt if enhanced
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;get_system_prompt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Relevant context from previous interactions:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;agent_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_prompts&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_memory_context()&lt;/code&gt; method performs two operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieves conversation history using the &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt; API&lt;/li&gt;
&lt;li&gt;Searches for relevant memories using the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/long-term-memory.html" rel="noopener noreferrer"&gt;RetrieveMemories&lt;/a&gt; operation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This enriched context becomes part of the agent's system prompt, providing historical context for the response generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Executing the Agent and Returning the Response
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Run the agent asynchronously
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent_instance&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store conversation after generating response
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Return the text response
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The function returns a string containing the agent's response. AgentCore Runtime takes this string and automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wraps it in the appropriate HTTP response structure&lt;/li&gt;
&lt;li&gt;Handles response formatting for API Gateway&lt;/li&gt;
&lt;li&gt;Manages error responses if exceptions occur&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice the &lt;code&gt;async&lt;/code&gt; keyword—the FunctionAgent in LlamaIndex supports asynchronous execution natively. This allows the agent to make multiple tool calls or process large documents without blocking other requests. The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html" rel="noopener noreferrer"&gt;BedrockAgentCoreApp&lt;/a&gt; handles async functions transparently, whether running locally or deployed.&lt;/p&gt;

&lt;h2&gt;
  
  
  AgentCore Memory Integration
&lt;/h2&gt;

&lt;p&gt;The memory system in this implementation uses &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory.html" rel="noopener noreferrer"&gt;AgentCore Memory&lt;/a&gt; to provide persistent context across sessions. Here's how it works:&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Retrieval and Prompt Enrichment
&lt;/h3&gt;

&lt;p&gt;The memory integration happens in two phases. First, when the agent starts processing a request, it retrieves relevant memories to enrich the input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get memory context using the shared memory manager
&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;get_memory_context()&lt;/code&gt; method performs these operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Load Conversation History&lt;/strong&gt;: On the first invocation of a session, it calls the &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt; API to retrieve up to 100 previous conversation turns:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;conversations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;  &lt;span class="c1"&gt;# Maximum conversation turns to retrieve
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve Relevant Memories&lt;/strong&gt;: It performs semantic search using the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/long-term-memory.html" rel="noopener noreferrer"&gt;RetrieveMemories&lt;/a&gt; operation to find relevant facts, preferences, and summaries:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_memories_for_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The namespace structure &lt;code&gt;/actor/{actor_id}/&lt;/code&gt; provides complete isolation between users—each actor has their own memory space that other users cannot access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storing Conversations After Response
&lt;/h3&gt;

&lt;p&gt;After the agent generates a response, the conversation is stored in AgentCore Memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Store conversation in AgentCore Memory
&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;store_conversation()&lt;/code&gt; method calls the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory-getting-started.html" rel="noopener noreferrer"&gt;create_event&lt;/a&gt; API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages_to_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ASSISTANT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages_to_store&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;create_event&lt;/code&gt; is called, AgentCore Memory automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores the raw conversation&lt;/li&gt;
&lt;li&gt;Extracts user preferences using the UserPreferences strategy&lt;/li&gt;
&lt;li&gt;Identifies semantic facts using the SemanticFacts strategy
&lt;/li&gt;
&lt;li&gt;Generates session summaries using the SessionSummaries strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These extracted insights become available for future retrievals, building the agent's long-term knowledge.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory as an Agentic Tool
&lt;/h3&gt;

&lt;p&gt;I've also exposed memory retrieval as a tool that the agent can use strategically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve relevant memories based on a query.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_memories_for_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No relevant memories found for query: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; memories for &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;No content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. (Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exposing memory retrieval as a tool gives the agent agency over its memory. Rather than automatically retrieving memories for every query, the agent can decide when memory context would be helpful. For instance, when asked about preferences, the agent might search for "user preferences" explicitly, or when solving a problem, it might search for similar problems from past conversations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Window Management
&lt;/h3&gt;

&lt;p&gt;LlamaIndex's context management is important when dealing with large documents or extensive conversation histories. The memory manager handles this by implementing a sliding window approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;conversations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;  &lt;span class="c1"&gt;# Retrieve up to 100 previous conversation turns
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt; API efficiently retrieves recent conversation history without overwhelming the context window. This matters when using LlamaIndex with document-heavy workflows where context space is at a premium.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying the Agent
&lt;/h2&gt;

&lt;p&gt;The deployment process leverages the AgentCore Starter Toolkit's streamlined workflow. First, I configure the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore configure &lt;span class="nt"&gt;-n&lt;/span&gt; llamaindexagent &lt;span class="nt"&gt;-e&lt;/span&gt; main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted, I accept the default values. This creates the necessary AWS infrastructure including IAM roles, ECR repositories, and Lambda function configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Testing
&lt;/h3&gt;

&lt;p&gt;Before deploying to the cloud, I always test locally to verify everything works correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch &lt;span class="nt"&gt;--local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This starts a containerized version of the agent running on my local machine. The local environment exactly mirrors the production environment—same container, same runtime, same memory access.&lt;/p&gt;

&lt;p&gt;Testing the local deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "What did I say about fruit?"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent retrieves the sample memory we added during setup and responds appropriately. I can then test more complex scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Search Wikipedia for information about apples and tell me if I would enjoy apple-based dishes"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tests both the Wikipedia tool integration and memory retrieval, demonstrating how LlamaIndex agents can combine external data sources with personal context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production Deployment
&lt;/h3&gt;

&lt;p&gt;Once satisfied with local testing, deploying to AWS is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AgentCore handles all the complex deployment tasks automatically. It builds the container image, pushes it to Amazon ECR, creates the Lambda function with proper networking configuration, sets up API Gateway endpoints, configures IAM permissions with least privilege access, and enables comprehensive CloudWatch logging.&lt;/p&gt;

&lt;p&gt;After deployment completes, I check the status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This provides the endpoint ARN, CloudWatch log group, and other deployment details I need for monitoring and debugging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleaning Up Resources
&lt;/h2&gt;

&lt;p&gt;To delete the resources created by &lt;code&gt;agentcore launch&lt;/code&gt;, I use the &lt;code&gt;agentcore&lt;/code&gt; command in the Python virtual event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command deletes the AgentCore agent, the ECR images, the CodeBuild project, and the IAM roles used by the agent and by CodeBuild.&lt;/p&gt;

&lt;p&gt;To delete the memory, including all stored events, the strategies, and the memories extracted form the events, I lookup the memory ID in the &lt;code&gt;../config/memory-config.json&lt;/code&gt; file and use the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws bedrock-agentcore-control delete-memory &lt;span class="nt"&gt;--memory-id&lt;/span&gt; &amp;lt;MEMORY_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This LlamaIndex implementation demonstrates how to build data-centric agents with RAG capabilities while maintaining production-grade memory persistence. The framework's modular architecture and extensive tool ecosystem make it ideal for agents that need to process diverse data sources while maintaining conversational context.&lt;/p&gt;

&lt;p&gt;In the next article, I'll explore LangGraph, showing how to build stateful, graph-based agent workflows using the same AgentCore infrastructure. You'll see how LangGraph's unique approach to agent orchestration enables complex, multi-step reasoning while leveraging our shared memory architecture.&lt;/p&gt;

&lt;p&gt;The complete code is available on &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. I encourage you to experiment with the implementation and explore how LlamaIndex's document processing capabilities can enhance your agent's knowledge base beyond just conversation memory.&lt;/p&gt;

&lt;p&gt;Ready to build your own data-powered AI agent? Clone the repository and start exploring the possibilities!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Building Production-Ready AI Agents with Pydantic AI and Amazon Bedrock AgentCore</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Mon, 15 Sep 2025 13:07:17 +0000</pubDate>
      <link>https://dev.to/aws/building-production-ready-ai-agents-with-pydantic-ai-and-amazon-bedrock-agentcore-738</link>
      <guid>https://dev.to/aws/building-production-ready-ai-agents-with-pydantic-ai-and-amazon-bedrock-agentcore-738</guid>
      <description>&lt;h1&gt;
  
  
  Building Production-Ready AI Agents with Pydantic AI and Amazon Bedrock AgentCore
&lt;/h1&gt;

&lt;p&gt;In this third deep dive of our &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-a-multi-framework-journey-with-amazon-bedrock-agentcore-p32"&gt;multi-framework series&lt;/a&gt;, I'll show you how to build type-safe AI agents using &lt;a href="https://ai.pydantic.dev/" rel="noopener noreferrer"&gt;Pydantic AI&lt;/a&gt; and deploy them with &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;agentcore-multi-framework-examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Pydantic AI brings the power of Python most popular validation library to AI agent development. If you've ever struggled with unpredictable LLM outputs or spent hours debugging type mismatches in your agent code, Pydantic AI offers a refreshing solution. It enforces structure where AI tends to be chaotic, providing type safety, automatic validation, and clear data contracts.&lt;/p&gt;

&lt;p&gt;Pydantic AI has a minimalist approach that simplifies implementations and works well for production deployments. Unlike the multi-agent orchestration of CrewAI or the model-based extensibility of Strands, Pydantic AI focuses on doing one thing exceptionally well: ensuring your agent's inputs and outputs are exactly what you expect them to be. This predictability is invaluable when building systems that other services depend on.&lt;/p&gt;

&lt;p&gt;While Pydantic AI excels at structured outputs through &lt;a href="https://ai.pydantic.dev/results/#structured-results" rel="noopener noreferrer"&gt;Pydantic models&lt;/a&gt;, automatic validation of responses, and built-in error handling for type mismatches, production deployments also need to follow security and scalability best practices. Let me show you how the integration with AgentCore enables persistent memory and scalable deployments without adding complexity to an agent code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment
&lt;/h2&gt;

&lt;p&gt;Let's start by setting up the Pydantic AI project. If you haven't already cloned the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/danilop/agentcore-multi-framework-examples.git
&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-multi-framework-examples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's set up the Pydantic AI project. I'm using &lt;a href="https://github.com/astral-sh/uv" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, a fast Python package installer, to manage dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-pydantic-ai
uv &lt;span class="nb"&gt;sync
source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project dependencies are remarkably lean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pydantic-ai-slim[bedrock]&lt;/code&gt;: The core framework with Amazon Bedrock support&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bedrock-agentcore&lt;/code&gt;: The SDK for integrating with AgentCore services&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bedrock-agentcore-starter-toolkit&lt;/code&gt;: CLI tools for deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice I'm using &lt;code&gt;pydantic-ai-slim&lt;/code&gt; rather than the full package. This gives me just the agent functionality without additional dependencies I don't need for this implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating and Configuring AgentCore Memory
&lt;/h2&gt;

&lt;p&gt;Before building our agent, let's set up AgentCore Memory. If you've already done this for previous examples, you can skip the creation steps and just copy the configuration file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Memory Setup
&lt;/h3&gt;

&lt;p&gt;For those who haven't set up memory yet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ../scripts
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run create-memory
uv run add-sample-memory
&lt;span class="nb"&gt;cd&lt;/span&gt; ../agentcore-pydantic-ai
&lt;span class="nb"&gt;cp&lt;/span&gt; ../config/memory-config.json &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The memory system provides three strategies that automatically extract insights from conversations: User Preferences (behavioral patterns), Semantic Facts (domain knowledge), and Session Summaries (conversation overviews). These work together to give our agents persistent memory across sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Type-Safe Agent
&lt;/h2&gt;

&lt;p&gt;Pydantic AI approach is refreshingly straightforward. Instead of complex configurations or multiple files, everything centers around the &lt;a href="https://ai.pydantic.dev/agents/" rel="noopener noreferrer"&gt;Agent class&lt;/a&gt; with its clean, functional API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Message History
&lt;/h3&gt;

&lt;p&gt;One of Pydantic AI key features is its built-in &lt;a href="https://ai.pydantic.dev/message-history/" rel="noopener noreferrer"&gt;message history&lt;/a&gt; management. Unlike other frameworks where you might manage conversation context manually, Pydantic AI provides a structured way to maintain conversation continuity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="c1"&gt;# Session state tracking (minimal global state)  
&lt;/span&gt;&lt;span class="n"&gt;session_message_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;# Dict[session_id, List[ModelMessage]]
&lt;/span&gt;
&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main entrypoint with refactored memory functionality using AgentMemoryManager.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_ACTOR_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get session_id from context (AgentCore automatically provides this)
&lt;/span&gt;    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get or initialize message history for this session
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;current_message_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The message history is a list of &lt;a href="https://ai.pydantic.dev/api/messages/" rel="noopener noreferrer"&gt;ModelMessage&lt;/a&gt; objects that Pydantic AI uses internally. By maintaining this history and passing it to the agent, we make the conversation flows naturally across multiple invocations.&lt;/p&gt;

&lt;h3&gt;
  
  
  AgentCore Memory Integration
&lt;/h3&gt;

&lt;p&gt;Combining Pydantic AI message history with AgentCore long-term memory provides different levels of memory. Let me show you how the integration can be implemented:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get memory context to add to user prompt
&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create enhanced user prompt with memory context
&lt;/span&gt;&lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Added memory context to user prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;MemoryManager&lt;/code&gt; (the same class we used in &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-with-strands-agents-and-amazon-bedrock-agentcore-3dg0"&gt;Strands Agents&lt;/a&gt; and CrewAI) retrieves relevant memories from past sessions. Here's how it works internally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get memory context as a string to be added to user input.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;session_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;context_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Load conversation history on first invocation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_initialized_sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;conversations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_conversation_turns&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;conversations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Format as conversation history
&lt;/span&gt;            &lt;span class="n"&gt;context_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conversations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;context_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recent conversation:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_initialized_sessions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve semantically relevant memories
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/actor/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;memory_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_format_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Relevant long-term memory:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This integration provides three levels of memory:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session Message History&lt;/strong&gt; (managed by Pydantic AI): Maintains conversation flow within the current session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conversation History&lt;/strong&gt; (via AgentCore &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt;): Loads previous conversations when a session starts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Memory&lt;/strong&gt; (via AgentCore &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/long-term-memory.html" rel="noopener noreferrer"&gt;RetrieveMemories&lt;/a&gt;): Searches across all memories for relevant context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By prepending this context to the user's prompt, the agent has access to historical information even in a new session, enabling truly persistent conversations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the Agent
&lt;/h3&gt;

&lt;p&gt;With memory context prepared, I create and run the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create agent with base instructions
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Be concise, reply with one sentence.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent with enhanced prompt and message history
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_message_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The result object contains:
# - result.output: The agent's text response (what we return to the user)
# - result.all_messages(): Complete message history including this interaction
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://ai.pydantic.dev/agents/#running-the-agent" rel="noopener noreferrer"&gt;run_sync&lt;/a&gt; method executes the agent synchronously, which is perfect for our serverless deployment model. The &lt;code&gt;message_history&lt;/code&gt; parameter provides the agent with context from earlier in the conversation. The method returns a result object where &lt;code&gt;result.output&lt;/code&gt; contains the agent's text response that we'll return to the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  Storing New Messages
&lt;/h3&gt;

&lt;p&gt;After the agent responds, I need to detect and store new messages in AgentCore Memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get all messages including the new interaction
&lt;/span&gt;&lt;span class="n"&gt;all_messages_after&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Detect new messages by comparing counts
&lt;/span&gt;&lt;span class="n"&gt;previous_message_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_message_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;new_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all_messages_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;previous_message_count&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Storing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; new messages in memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;store_pydantic_messages_in_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Update session message history (keep last NUM_MESSAGES)
&lt;/span&gt;&lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;all_messages_after&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;NUM_MESSAGES&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach efficiently detects which messages are new by comparing message counts before and after the agent run. The &lt;code&gt;store_pydantic_messages_in_memory&lt;/code&gt; function handles Pydantic AI message format and uses the &lt;code&gt;store_conversation&lt;/code&gt; method to store them in AgentCore Memory.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;store_conversation&lt;/code&gt; calls the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory-getting-started.html" rel="noopener noreferrer"&gt;create_event&lt;/a&gt; API internally, it not only stores the raw conversation but also triggers the memory strategies to automatically extract user preferences, semantic facts, and generate session summaries. This is how our agent builds long-term knowledge from every interaction.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;NUM_MESSAGES&lt;/code&gt; limit (set to 30) prevents the message history from growing unbounded. This is important for managing token limits and ensuring consistent performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating with AgentCore Runtime
&lt;/h2&gt;

&lt;p&gt;The integration with AgentCore Runtime follows the same pattern as our other frameworks, but Pydantic AI simplicity makes it particularly elegant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Main entrypoint with refactored memory functionality using AgentMemoryManager.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_ACTOR_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Get or initialize message history for this session
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="n"&gt;current_message_history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Get memory context to enhance the prompt
&lt;/span&gt;    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create enhanced prompt with memory context
&lt;/span&gt;    &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;enhanced_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Create and run the agent
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Be concise, reply with one sentence.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enhanced_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;current_message_history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store new messages in memory
&lt;/span&gt;    &lt;span class="n"&gt;new_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_message_history&lt;/span&gt;&lt;span class="p"&gt;):]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;store_pydantic_messages_in_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Update session message history
&lt;/span&gt;    &lt;span class="n"&gt;session_message_history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all_messages&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;NUM_MESSAGES&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="c1"&gt;# Return the agent's string output
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html" rel="noopener noreferrer"&gt;BedrockAgentCoreApp&lt;/a&gt; provides all the infrastructure scaffolding, while the &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html#creating-the-entrypoint" rel="noopener noreferrer"&gt;@entrypoint decorator&lt;/a&gt; marks our handler function.&lt;/p&gt;

&lt;p&gt;The key line is &lt;code&gt;result = agent.run_sync(enhanced_prompt, message_history=current_message_history)&lt;/code&gt; where the Pydantic AI agent processes the request. The &lt;code&gt;run_sync()&lt;/code&gt; method returns a result object, and we return &lt;code&gt;result.output&lt;/code&gt; which contains the agent's text response as a string. AgentCore Runtime automatically wraps this string in the appropriate HTTP response structure for API Gateway, so you don't need to worry about response formatting—just return the text content you want to send back to the user.&lt;/p&gt;

&lt;p&gt;This implementation is simple—there's no complex class hierarchy, no multiple configuration files, just a straightforward function that processes requests and returns responses. This aligns perfectly with Pydantic AI approach of minimalism and clarity.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/context.html" rel="noopener noreferrer"&gt;RequestContext&lt;/a&gt; automatically provides session management:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get session_id from context (AgentCore automatically provides this)
&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using session_id from context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Session isolation is implemented at infrastructure level—each user gets their own session that can persist across invocations but remains completely isolated from other users.&lt;/p&gt;

&lt;p&gt;When deployed, AgentCore Runtime handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Request routing&lt;/strong&gt;: API Gateway forwards requests to your Lambda function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session management&lt;/strong&gt;: Automatic session tracking and isolation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling&lt;/strong&gt;: Lambda automatically scales based on request volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error handling&lt;/strong&gt;: Built-in retry logic and error reporting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logging&lt;/strong&gt;: CloudWatch integration for monitoring and debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your Pydantic AI agent doesn't need to know about any of this infrastructure—it just focuses on processing messages and returning responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Manager Architecture
&lt;/h2&gt;

&lt;p&gt;The Pydantic AI implementation uses the &lt;code&gt;store_conversation()&lt;/code&gt; method from the shared memory module, with framework-specific message conversion logic in the main application file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_pydantic_messages_for_storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert Pydantic AI message objects to memory storage format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages_to_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Handle Pydantic AI ModelMessage objects
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;kind&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;request&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;part_kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user-prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SYSTEM&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
                        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ASSISTANT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
                        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ASSISTANT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

                    &lt;span class="n"&gt;message_tuple&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;messages_to_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message_tuple&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages_to_store&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;store_pydantic_messages_in_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Store Pydantic AI messages using the unified store_conversation method.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages_to_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;convert_pydantic_messages_for_storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store each message pair using store_conversation
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages_to_store&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages_to_store&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;user_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages_to_store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;assistant_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages_to_store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;assistant_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach keeps the shared memory module clean and unified while handling Pydantic AI message format in the application-specific code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Memory Architecture
&lt;/h2&gt;

&lt;p&gt;Like the other frameworks in this series, Pydantic AI uses the same unified memory management module. This architectural decision allows complete portability of memories across frameworks.&lt;/p&gt;

&lt;p&gt;The shared &lt;code&gt;memory.py&lt;/code&gt; module provides:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Components:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MemoryConfig&lt;/code&gt; class for centralized configuration management&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MemoryManager&lt;/code&gt; class with framework-agnostic memory operations&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;retrieve_memories_for_actor()&lt;/code&gt; function for semantic search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;format_memory_context()&lt;/code&gt; function for consistent formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Unified Interface:&lt;/strong&gt;&lt;br&gt;
All frameworks use the same &lt;code&gt;store_conversation()&lt;/code&gt; method, with framework-specific message conversion handled in the application code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_convert_messages_for_storage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Convert Pydantic AI messages to AgentCore format.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;converted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ASSISTANT&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
            &lt;span class="n"&gt;converted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;converted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This demonstrates how the shared architecture adapts to each framework's specific needs while maintaining the same core functionality. Memories created by a Pydantic AI agent can be accessed by agents built with &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-with-strands-agents-and-amazon-bedrock-agentcore-3dg0"&gt;Strands&lt;/a&gt;, CrewAI, or any other framework in the series.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Locally
&lt;/h2&gt;

&lt;p&gt;Before deploying to production, let's test locally. Configure the agent for AgentCore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore configure &lt;span class="nt"&gt;-n&lt;/span&gt; pydanticaiagent &lt;span class="nt"&gt;-e&lt;/span&gt; main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Press Enter to accept all defaults. This creates the necessary AWS resources.&lt;/p&gt;

&lt;p&gt;Launch the agent locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch &lt;span class="nt"&gt;--local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{ "prompt": "What did I say about fruit?" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent should retrieve the sample memory about fruit preferences and respond concisely. Notice how the response is indeed one sentence, following the instruction we provided.&lt;/p&gt;

&lt;p&gt;Test conversation continuity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{ "prompt": "Tell me more about my preferences" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent maintains context from the previous message and can elaborate on your preferences while staying concise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to Production
&lt;/h2&gt;

&lt;p&gt;Once local testing is complete, deploy to AWS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AgentCore Runtime handles all the deployment complexity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building and pushing container images to Amazon ECR&lt;/li&gt;
&lt;li&gt;Creating AWS Lambda functions with proper configuration&lt;/li&gt;
&lt;li&gt;Setting up API Gateway endpoints&lt;/li&gt;
&lt;li&gt;Configuring IAM permissions&lt;/li&gt;
&lt;li&gt;Enabling CloudWatch logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check your deployment status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test the production deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="s1"&gt;'{ "prompt": "What did I say about fruit?" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Monitor logs in real-time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws logs &lt;span class="nb"&gt;tail&lt;/span&gt; /aws/bedrock-agentcore/runtimes/&amp;lt;AGENT_ID-ENDPOINT_ID&amp;gt; &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cleaning Up Resources
&lt;/h2&gt;

&lt;p&gt;To delete the resources created by &lt;code&gt;agentcore launch&lt;/code&gt;, I use the &lt;code&gt;agentcore&lt;/code&gt; command in the Python virtual event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command deletes the AgentCore agent, the ECR images, the CodeBuild project, and the IAM roles used by the agent and by CodeBuild.&lt;/p&gt;

&lt;p&gt;To delete the memory, including all stored events, the strategies, and the memories extracted form the events, I lookup the memory ID in the &lt;code&gt;../config/memory-config.json&lt;/code&gt; file and use the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws bedrock-agentcore-control delete-memory &lt;span class="nt"&gt;--memory-id&lt;/span&gt; &amp;lt;MEMORY_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This Pydantic AI implementation demonstrates how type safety and simplicity can coexist in production AI systems. The minimal API surface area makes the code easy to understand and maintain, while the integration with AgentCore Memory provides the persistence needed for meaningful conversations.&lt;/p&gt;

&lt;p&gt;What strikes me most about Pydantic AI is how it manages to be both simple and powerful. There's no magic, no hidden complexity—just clean Python code with predictable behavior. This predictability becomes invaluable as your system grows and other services start depending on your agent's outputs.&lt;/p&gt;

&lt;p&gt;In the next article, I'll explore LlamaIndex, showing how to build agents with advanced retrieval capabilities and knowledge management. You'll see how the same memory architecture and deployment patterns adapt to a framework designed for working with large document collections.&lt;/p&gt;

&lt;p&gt;The complete code is available on &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. I encourage you to experiment with adding Pydantic models for structured outputs, implementing custom validators, or building agents that return complex data types.&lt;/p&gt;

&lt;p&gt;Ready to build your own type-safe AI agent? Clone the repo and start experimenting!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>bedrock</category>
      <category>agentcore</category>
    </item>
    <item>
      <title>Building Production-Ready AI Agents with CrewAI and Amazon Bedrock AgentCore</title>
      <dc:creator>Danilo Poccia</dc:creator>
      <pubDate>Mon, 15 Sep 2025 12:44:09 +0000</pubDate>
      <link>https://dev.to/aws/building-production-ready-ai-agents-with-crewai-and-amazon-bedrock-agentcore-2g36</link>
      <guid>https://dev.to/aws/building-production-ready-ai-agents-with-crewai-and-amazon-bedrock-agentcore-2g36</guid>
      <description>&lt;p&gt;In this second deep dive of our &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-a-multi-framework-journey-with-amazon-bedrock-agentcore-p32"&gt;multi-framework series&lt;/a&gt;, I'll show you how to build collaborative multi-agent systems using &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; and deploy them with &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;Amazon Bedrock AgentCore&lt;/a&gt;. The complete code for this implementation, along with examples for other frameworks, is available on GitHub at &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;agentcore-multi-framework-examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;CrewAI takes a fundamentally different approach from the single-agent model we explored with Strands Agents. Instead of one agent handling everything, CrewAI orchestrates multiple specialized agents working together like a well-coordinated team. Each agent has a specific role, goal, and backstory that shapes its behavior. This mirrors how human teams operate—you have researchers who gather information, analysts who process it, and managers who coordinate the workflow. Note that Strands Agents supports &lt;a href="https://strandsagents.com/latest/documentation/docs/user-guide/concepts/multi-agent/agent-to-agent/" rel="noopener noreferrer"&gt;different approaches and protocols for multi-agent architectures&lt;/a&gt; but in the sample code uses only one agent for simplicity.&lt;/p&gt;

&lt;p&gt;CrewAI maps to real-world workflows in a straightforward way. When I need to research a topic and write a report, I'm essentially performing multiple roles: first as a researcher gathering information, then as an analyst synthesizing findings. CrewAI makes this explicit, allowing me to define these roles as separate agents that collaborate through structured tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Development Environment
&lt;/h2&gt;

&lt;p&gt;Let's start by setting up the CrewAI project. If you haven't already cloned the repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/danilop/agentcore-multi-framework-examples.git
&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-multi-framework-examples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now let's set up the CrewAI project. I'm using &lt;a href="https://github.com/astral-sh/uv" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, a fast Python package installer, to manage dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;agentcore-crew-ai
uv &lt;span class="nb"&gt;sync
source&lt;/span&gt; .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project dependencies include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;crewai[tools]&lt;/code&gt;: The core framework with built-in tools support&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bedrock-agentcore&lt;/code&gt;: The SDK for integrating with AgentCore services&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bedrock-agentcore-starter-toolkit&lt;/code&gt;: CLI tools for deployment&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;boto3&lt;/code&gt;: AWS SDK for Python, used by AgentCore&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also install the CrewAI CLI tool which helps with project scaffolding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;crewai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I previously used the CrewAI CLI to create the initial project structure with &lt;code&gt;crewai create crew agentcore-crew-ai&lt;/code&gt;. When prompted, I selected one of the Amazon Bedrock models, though this isn't a requirement for using AgentCore Runtime—any LLM provider supported by CrewAI will work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating and Configuring AgentCore Memory
&lt;/h2&gt;

&lt;p&gt;Before we dive into building our crew, let's set up AgentCore Memory. If you've already done this for the Strands Agents example, you can skip the creation steps and just copy the configuration file to have long-term memory "shared" between the agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Memory Setup
&lt;/h3&gt;

&lt;p&gt;For those who haven't set up memory yet, here's the quick version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ../scripts
uv &lt;span class="nb"&gt;sync
&lt;/span&gt;uv run create-memory
uv run add-sample-memory
&lt;span class="nb"&gt;cd&lt;/span&gt; ../agentcore-crew-ai
&lt;span class="nb"&gt;cp&lt;/span&gt; ../config/memory-config.json &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The memory system provides three strategies that automatically extract insights from conversations: User Preferences (behavioral patterns), Semantic Facts (domain knowledge), and Session Summaries (conversation overviews). These work together to give our agents persistent memory across sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Crew
&lt;/h2&gt;

&lt;p&gt;When running, CrewAI orchestrates multiple agents. Let me show you how I've structured the crew for our research and reporting workflow. This is based on the sample crews created by the &lt;code&gt;crew&lt;/code&gt; tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining Agents Through Configuration
&lt;/h3&gt;

&lt;p&gt;CrewAI supports &lt;a href="https://docs.crewai.com/concepts/agents#yaml-configuration-recommended" rel="noopener noreferrer"&gt;YAML configuration&lt;/a&gt; for defining agents, which I find cleaner than hardcoding everything. In &lt;code&gt;config/agents.yaml&lt;/code&gt;, I define two specialized agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;researcher&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;{topic} Senior Data Researcher&lt;/span&gt;
  &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Uncover cutting-edge developments in {topic}&lt;/span&gt;
  &lt;span class="na"&gt;backstory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;You're a seasoned researcher with a knack for uncovering the latest&lt;/span&gt;
    &lt;span class="s"&gt;developments in {topic}. Known for your ability to find the most relevant&lt;/span&gt;
    &lt;span class="s"&gt;information and present it in a clear and concise manner.&lt;/span&gt;

&lt;span class="na"&gt;reporting_analyst&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;{topic} Reporting Analyst&lt;/span&gt;
  &lt;span class="na"&gt;goal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Create detailed reports based on {topic} data analysis and research findings&lt;/span&gt;
  &lt;span class="na"&gt;backstory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;You're a meticulous analyst with a keen eye for detail. You're known for&lt;/span&gt;
    &lt;span class="s"&gt;your ability to turn complex data into clear and concise reports, making&lt;/span&gt;
    &lt;span class="s"&gt;it easy for others to understand and act on the information you provide.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;{topic}&lt;/code&gt; placeholders—these get replaced at runtime with the actual topic the user wants to research. The &lt;a href="https://docs.crewai.com/concepts/agents#agent-attributes" rel="noopener noreferrer"&gt;backstory&lt;/a&gt; is particularly important as it shapes how the agent approaches its tasks, influencing the tone and style of its outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Defining Tasks
&lt;/h3&gt;

&lt;p&gt;Similarly, tasks are defined in &lt;code&gt;config/tasks.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;research_task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Conduct a thorough research about {topic}&lt;/span&gt;
    &lt;span class="s"&gt;Make sure you find any interesting and relevant information given&lt;/span&gt;
    &lt;span class="s"&gt;the current year is {current_year}.&lt;/span&gt;
  &lt;span class="na"&gt;expected_output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;A list with 10 bullet points of the most relevant information about {topic}&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;researcher&lt;/span&gt;

&lt;span class="na"&gt;reporting_task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;Review the context you got and expand each topic into a full section for a report.&lt;/span&gt;
    &lt;span class="s"&gt;Make sure the report is detailed and contains any and all relevant information.&lt;/span&gt;
  &lt;span class="na"&gt;expected_output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="s"&gt;A fully fledged report with the main topics, each with a full section of information.&lt;/span&gt;
    &lt;span class="s"&gt;. . .&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;reporting_analyst&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each task specifies which agent should handle it, creating clear ownership and responsibility. The &lt;a href="https://docs.crewai.com/concepts/tasks#task-attributes" rel="noopener noreferrer"&gt;expected_output&lt;/a&gt; field helps the agent understand what format the result should take.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Crew Class
&lt;/h3&gt;

&lt;p&gt;The crew itself is defined using CrewAI &lt;a href="https://docs.crewai.com/concepts/crews#creating-a-crew" rel="noopener noreferrer"&gt;@CrewBase decorator&lt;/a&gt; pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai.project&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrewBase&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;

&lt;span class="nd"&gt;@CrewBase&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentcoreCrewAi&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;AgentcoreCrewAi crew&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="nd"&gt;@agent&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;researcher&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@agent&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reporting_analyst&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reporting_analyst&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@task&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;research_task&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@task&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;reporting_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks_config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reporting_task&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@crew&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Creates the AgentcoreCrewAi crew&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decorators handle all the wiring—&lt;code&gt;@agent&lt;/code&gt; and &lt;code&gt;@task&lt;/code&gt; automatically collect the defined agents and tasks, making them available to the crew. I'm using &lt;a href="https://docs.crewai.com/concepts/crews#process" rel="noopener noreferrer"&gt;Process.sequential&lt;/a&gt; here, which means tasks execute one after another. CrewAI also supports hierarchical processes for more complex workflows where a manager agent coordinates subordinates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating with AgentCore Runtime
&lt;/h2&gt;

&lt;p&gt;Now comes the crucial part—integrating our crew with AgentCore Runtime for production deployment. The integration happens in &lt;code&gt;main.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockAgentCoreApp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bedrock_agentcore.runtime.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RequestContext&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RequestContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Invoke the crew with enhanced memory functionality.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CrewAI invocation started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No prompt provided in payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: No prompt provided&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Get actor_id and session_id from payload or context
&lt;/span&gt;    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actor_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_ACTOR_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DEFAULT_SESSION_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Enhance input with memory context
&lt;/span&gt;    &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;

    &lt;span class="c1"&gt;# Prepare inputs and execute the crew
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enhanced_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;current_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Execute the crew and get the result
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentcoreCrewAi&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store the conversation in memory
&lt;/span&gt;    &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Return the crew's raw text output
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html" rel="noopener noreferrer"&gt;BedrockAgentCoreApp&lt;/a&gt; handles all the infrastructure concerns—setting up the HTTP server, managing request routing, and handling errors. The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/quickstart.html#creating-the-entrypoint" rel="noopener noreferrer"&gt;@entrypoint decorator&lt;/a&gt; marks the function that AgentCore Runtime will invoke when your agent receives a request.&lt;/p&gt;

&lt;p&gt;The key line is &lt;code&gt;result = AgentcoreCrewAi().crew().kickoff(inputs=inputs)&lt;/code&gt; where the crew executes all its tasks. The &lt;code&gt;kickoff()&lt;/code&gt; method returns a result object containing the crew's output. We then return &lt;code&gt;result.raw&lt;/code&gt;, which contains the raw text output from the crew's execution. AgentCore Runtime takes this string response and packages it appropriately for the response, handling HTTP response formatting automatically.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/runtime/context.html" rel="noopener noreferrer"&gt;RequestContext&lt;/a&gt; provides session information that's automatically managed by AgentCore. This is crucial for maintaining conversation continuity—each user gets their own isolated session that persists across invocations.&lt;/p&gt;

&lt;h2&gt;
  
  
  AgentCore Memory Integration
&lt;/h2&gt;

&lt;p&gt;One of the interesting aspects of this implementation is how seamlessly memory integrates with the crew workflow. I use the same memory module we developed for &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-with-strands-agents-and-amazon-bedrock-agentcore-3dg0"&gt;Strands Agents&lt;/a&gt;, demonstrating the portability of our architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhancing Input with Memory Context
&lt;/h3&gt;

&lt;p&gt;Before the crew starts working, I retrieve relevant memories and add them to the input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Enhance input with relevant memories using memory manager
&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;enhanced_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;

&lt;span class="c1"&gt;# Prepare inputs for crew
&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;enhanced_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;current_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;MemoryManager&lt;/code&gt; class encapsulates all memory operations. Let me show you exactly how &lt;code&gt;get_memory_context()&lt;/code&gt; works internally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_conversation_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retrieve_relevant_memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get memory context as a string to be added to user input.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_actor_id&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;default_session_id&lt;/span&gt;
    &lt;span class="n"&gt;session_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;context_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Load conversation history on first invocation
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;load_conversation_context&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_initialized_sessions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading conversation context for session: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;conversation_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_conversation_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;conversation_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Recent conversation:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;conversation_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_initialized_sessions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;session_key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve semantically relevant memories
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;retrieve_relevant_memories&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;relevant_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_memories_for_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;relevant_memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;memory_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_memory_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relevant_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Relevant long-term memory context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context_parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context_parts&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method performs two critical functions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Loading Conversation History&lt;/strong&gt;: On the first invocation of a session, it retrieves previous conversation turns using the &lt;a href="https://aws.github.io/bedrock-agentcore-starter-toolkit/user-guide/memory/quickstart.html" rel="noopener noreferrer"&gt;get_last_k_turns&lt;/a&gt; API. The &lt;code&gt;_initialized_sessions&lt;/code&gt; dictionary tracks which sessions have been loaded, preventing redundant API calls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Retrieving Relevant Memories&lt;/strong&gt;: The &lt;code&gt;retrieve_memories_for_actor&lt;/code&gt; function performs semantic search across all stored memories—preferences, facts, and summaries—using the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/long-term-memory.html" rel="noopener noreferrer"&gt;RetrieveMemories&lt;/a&gt; operation. Here's how it builds the namespace and performs the search:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_memories_for_actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;search_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MemoryClient&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Retrieve memories for a specific actor from the memory store.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/actor/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_query&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to retrieve memories: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The namespace structure &lt;code&gt;/actor/{actor_id}/&lt;/code&gt; provides isolation between users—each actor has their own memory space that can be used to retrieve memories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running the Crew
&lt;/h3&gt;

&lt;p&gt;With memory context prepared, I execute the crew:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Execute the crew
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentcoreCrewAi&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store the conversation in memory
&lt;/span&gt;&lt;span class="n"&gt;memory_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Return the result
&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://docs.crewai.com/concepts/crews#crew-execution" rel="noopener noreferrer"&gt;kickoff&lt;/a&gt; method starts the crew execution. The agents work through their tasks sequentially—first the researcher gathers information, then the reporting analyst creates a comprehensive report based on those findings. The result object contains the crew's output, with &lt;code&gt;result.raw&lt;/code&gt; providing the raw text that we return to the user.&lt;/p&gt;

&lt;p&gt;After execution, I store the conversation using &lt;code&gt;store_conversation()&lt;/code&gt;. This triggers the &lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/memory-getting-started.html" rel="noopener noreferrer"&gt;create_event&lt;/a&gt; API, which not only stores the raw conversation but also triggers the memory strategies to extract preferences, facts, and generate summaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Manager Architecture
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;MemoryManager&lt;/code&gt; class provides a high-level interface that abstracts the complexity of AgentCore Memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Unified memory manager for all AgentCore frameworks.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;default_actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;default_session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_conversation_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Load memory configuration
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryConfig&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Session tracking for conversation context loading
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_initialized_sessions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The class tracks which sessions have been initialized to avoid reloading conversation history on every invocation. This optimization is important because loading history is only needed once per session lifecycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Memory Architecture
&lt;/h2&gt;

&lt;p&gt;One key architectural decision I made was creating a unified memory management module that's identical across all framework implementations in this series. This &lt;code&gt;memory.py&lt;/code&gt; module contains two classes and two standalone functions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;MemoryConfig&lt;/code&gt; class: Manages centralized configuration&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;__init__()&lt;/code&gt; method: Loads the memory configuration from JSON file with caching&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;memory_id&lt;/code&gt; property: Returns the configured memory ID&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;code&gt;MemoryManager&lt;/code&gt; class: High-level interface for all memory operations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;get_memory_context()&lt;/code&gt; method: Retrieves both conversation history and relevant memories&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;store_conversation()&lt;/code&gt; method: Saves user input and agent responses&lt;/li&gt;
&lt;li&gt;Additional helper methods for managing session state&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Standalone Functions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;retrieve_memories_for_actor()&lt;/code&gt;: Performs semantic search across memory namespaces for a specific actor&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;format_memory_context()&lt;/code&gt;: Formats retrieved memories into consistent text for injection into prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By sharing this module across CrewAI, &lt;a href="https://dev.to/aws/building-production-ready-ai-agents-with-strands-agents-and-amazon-bedrock-agentcore-3dg0"&gt;Strands Agents&lt;/a&gt;, Pydantic AI, LlamaIndex, and LangGraph implementations, I try to share a consistent and portabile approach. Memory created by one framework can be used by another, and improvements benefit all implementations. &lt;/p&gt;

&lt;p&gt;The CrewAI implementation instantiates the &lt;code&gt;MemoryManager&lt;/code&gt; class directly in &lt;code&gt;main.py&lt;/code&gt;, using it to enhance inputs before crew execution and store results afterward. This demonstrates how the shared architecture adapts to different framework patterns while maintaining the same core functionality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Locally
&lt;/h2&gt;

&lt;p&gt;Before deploying to production, I always test locally. First, configure the agent for AgentCore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore configure &lt;span class="nt"&gt;-n&lt;/span&gt; crewaiagent &lt;span class="nt"&gt;-e&lt;/span&gt; src/agentcore_crew_ai/main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Press Enter to accept all defaults. This creates the necessary AWS resources like IAM roles and ECR repositories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Python Version Compatibility
&lt;/h3&gt;

&lt;p&gt;CrewAI has a dependency (&lt;code&gt;chroma-hnswlib&lt;/code&gt;) that needs specific Python versions. To handle this, I update the base image in the generated &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ghcr.io/astral-sh/uv:python3.11-bookworm-slim&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also explicitly set the model environment variable since the &lt;code&gt;.dockerignore&lt;/code&gt; file (correctly) excludes &lt;code&gt;.env&lt;/code&gt; files that are often used for credentials and API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; MODEL=bedrock/us.amazon.nova-pro-v1:0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use any model supported by CrewAI here—AgentCore Runtime is model-agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Launch and Testing
&lt;/h3&gt;

&lt;p&gt;Now launch the agent locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch &lt;span class="nt"&gt;--local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This builds a Docker container and starts it locally. Test it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="nt"&gt;--local&lt;/span&gt; &lt;span class="s1"&gt;'{ "prompt": "AI multi-agent architectures - Also, what did I say about fruit?" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The crew should research AI multi-agent architectures and create a detailed report. Additionally, it should retrieve the memory about fruit preferences we added earlier, demonstrating that memory persistence works across different framework implementations!&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying to Production
&lt;/h2&gt;

&lt;p&gt;Once local testing is complete, deploying to AWS is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore launch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AgentCore Runtime handles all the complexity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building and pushing container images to Amazon ECR&lt;/li&gt;
&lt;li&gt;Creating AWS Lambda functions with proper networking&lt;/li&gt;
&lt;li&gt;Setting up API Gateway endpoints&lt;/li&gt;
&lt;li&gt;Configuring IAM permissions
&lt;/li&gt;
&lt;li&gt;Enabling CloudWatch logging&lt;/li&gt;
&lt;li&gt;Managing auto-scaling policies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check your deployment status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows your endpoint ARN, CloudWatch logs location, and other deployment details. To monitor logs in real-time, use the AWS CLI command provided:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws logs &lt;span class="nb"&gt;tail&lt;/span&gt; /aws/bedrock-agentcore/runtimes/&amp;lt;AGENT_ID-ENDPOINT_ID&amp;gt; &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test the production deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="s1"&gt;'{ "prompt": "AI multi-agent architectures - Also, what did I say about fruit?" }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The production agent should perform exactly like the local version, but now it's running on scalable, managed infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cleaning Up Resources
&lt;/h2&gt;

&lt;p&gt;To delete the resources created by &lt;code&gt;agentcore launch&lt;/code&gt;, I use the &lt;code&gt;agentcore&lt;/code&gt; command in the Python virtual event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command deletes the AgentCore agent, the ECR images, the CodeBuild project, and the IAM roles used by the agent and by CodeBuild.&lt;/p&gt;

&lt;p&gt;To delete the memory, including all stored events, the strategies, and the memories extracted form the events, I lookup the memory ID in the &lt;code&gt;../config/memory-config.json&lt;/code&gt; file and use the AWS CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws bedrock-agentcore-control delete-memory &lt;span class="nt"&gt;--memory-id&lt;/span&gt; &amp;lt;MEMORY_ID&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This CrewAI implementation shows how to build sophisticated multi-agent systems that are production-ready from day one. The separation between agent roles mirrors real-world team dynamics, making the system intuitive to design and maintain.&lt;/p&gt;

&lt;p&gt;In the next article, I'll explore Pydantic AI, showing how to build type-safe agents with automatic validation. You'll see how the same memory system and deployment patterns work with a completely different framework philosophy, demonstrating the flexibility of the AgentCore platform.&lt;/p&gt;

&lt;p&gt;The complete code is available on &lt;a href="https://github.com/danilop/agentcore-multi-framework-examples" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. I encourage you to experiment with different crew configurations—try adding more agents, implementing parallel processes, or integrating custom tools. The combination of CrewAI flexibility and AgentCore infrastructure gives you endless possibilities.&lt;/p&gt;

&lt;p&gt;Ready to build your own AI crew? Clone the repo and start experimenting!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>agentcore</category>
      <category>bedrock</category>
    </item>
  </channel>
</rss>
