<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marcelo Acosta Cavalero</title>
    <description>The latest articles on DEV Community by Marcelo Acosta Cavalero (@acostacavalero).</description>
    <link>https://dev.to/acostacavalero</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1328258%2F285f4067-fee6-4354-9d7f-a9f60b6e5861.jpg</url>
      <title>DEV Community: Marcelo Acosta Cavalero</title>
      <link>https://dev.to/acostacavalero</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/acostacavalero"/>
    <language>en</language>
    <item>
      <title>25 Internal Knowledge and Productivity Agent Patterns on AWS You Can Steal Right Now</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Mon, 06 Apr 2026 13:09:09 +0000</pubDate>
      <link>https://dev.to/aws-builders/25-internal-knowledge-and-productivity-agent-patterns-on-aws-you-can-steal-right-now-34b4</link>
      <guid>https://dev.to/aws-builders/25-internal-knowledge-and-productivity-agent-patterns-on-aws-you-can-steal-right-now-34b4</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://buildwithaws.substack.com" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!F23a!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e0a17f9-b221-4059-b7ea-d3abad12001a_1129x944.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhjx4xoii5yjw0oyhybtr.jpeg" width="800" height="669"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An engineer spent 40 minutes last Thursday searching for the internal API rate-limiting policy. She checked Confluence, Notion, three Slack channels, and finally asked a colleague who pointed her to a Google Doc shared in a thread six months ago. The policy existed.&lt;/p&gt;

&lt;p&gt;Finding it was the problem.&lt;/p&gt;

&lt;p&gt;This is the second edition of a five-part series cataloging real AI architecture patterns running on AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://buildwithaws.substack.com/p/stop-designing-ai-agents-from-scratch" rel="noopener noreferrer"&gt;Edition 1 covered 25 customer-facing agents.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This edition shifts the lens inward: 25 patterns for employee-facing agents that handle knowledge retrieval, internal support, operational productivity, and the daily friction that slows teams down.&lt;/p&gt;

&lt;p&gt;If you missed Edition 1, go back for the &lt;a href="https://buildwithaws.substack.com/i/192832743/agent-or-not-five-questions" rel="noopener noreferrer"&gt;“Agent or Not?”&lt;/a&gt; scoring framework and the AgentCore vs Quick breakdown.&lt;/p&gt;

&lt;p&gt;Those mental models apply here too, so this edition skips straight to the architectures and use cases.&lt;/p&gt;

&lt;p&gt;One platform update before the cards: Edition 1 split the world into AgentCore (custom agents) and Quick (analytics).&lt;/p&gt;

&lt;p&gt;Internal agents add a third lane. &lt;strong&gt;Amazon Q Business&lt;/strong&gt; is the AWS-native default for enterprise knowledge assistants, permissions-aware search, and SaaS-connected internal help desks.&lt;/p&gt;

&lt;p&gt;It ships with native connectors for Google Drive, Slack, Confluence, Jira, SharePoint, and dozens more, with document-level ACLs built in.&lt;/p&gt;

&lt;p&gt;Q Business can trigger actions through plugins, but AgentCore remains the better choice when workflows require deterministic orchestration, multi-step execution, or strict policy enforcement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AgentCore&lt;/strong&gt; remains the right choice for custom agent backends with tool orchestration, memory, identity, and fine-grained control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Quick&lt;/strong&gt; stays in its lane for analytics, dashboarding, research, and workflow automation around business data.&lt;/p&gt;

&lt;p&gt;Several patterns below use Q Business for retrieval and AgentCore for action, which turns out to be the natural split for internal workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architectures for Internal Agents
&lt;/h2&gt;

&lt;p&gt;Internal agents integrate with different systems than customer-facing ones. Corporate identity providers, internal wikis, HR platforms, CI/CD pipelines, and financial systems replace the CRM and e-commerce APIs from Edition 1.&lt;/p&gt;

&lt;p&gt;The four reference architectures adapt accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture D - Single Agent with Internal Tool Access
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!TrGQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78a4ef98-a172-40f7-b944-aafeaf8ecc5e_1376x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2175m7ptwoe5i5vksvc.jpeg" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; The agent reasons about which internal tools to query, in what order, based on the employee’s role and question. One agent handles the full interaction with 3-8 internal system integrations.&lt;/p&gt;

&lt;p&gt;Covers most IT support, HR advisory, and workflow-execution agents where the agent needs to take actions through APIs.&lt;/p&gt;

&lt;p&gt;For pure knowledge retrieval and Q&amp;amp;A, see Architecture D2 below.&lt;/p&gt;

&lt;p&gt;AgentCore Identity integrates with your corporate IdP (Okta, Azure AD) for SSO. AgentCore Policy enforces role-based access scoping - verify maturity for your target region before production rollout.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture E - Quick Workspace for Internal Intelligence
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!n5aA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F246f662b-186a-41c4-9d83-6249ec3741b7_1376x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frklntlz2k6txd3d6zxsy.jpeg" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Quick&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Teams need AI-powered analysis of internal data, operational metrics, or workforce analytics without writing code.&lt;/p&gt;

&lt;p&gt;Covers engineering velocity dashboards, headcount planning analysis, budget tracking, and self-service reporting for managers and operations teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture F - Multi-Agent Internal Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!bLHC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2efb03b2-ccf5-4ce7-911e-22299a012da1_1376x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqfgq5el5tyihoc2q4lse.jpeg" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore (multi-agent)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Employee requests span IT, HR, finance, and facilities.&lt;/p&gt;

&lt;p&gt;Each domain needs its own tools, knowledge bases, and policy constraints.&lt;/p&gt;

&lt;p&gt;A single agent trying to handle all internal functions becomes unreliable at 15+ tools. Specialized agents behind a router keep each context window focused.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture G - Q Business for Enterprise Knowledge
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!dynp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0715c386-617b-442f-88e6-ad2424bd48f1_1376x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwya59dkt90fon8muxcss.jpeg" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Amazon Q Business&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; The primary need is permissions-aware search and Q&amp;amp;A across SaaS knowledge sources.&lt;/p&gt;

&lt;p&gt;Q Business ships with native connectors for dozens of data sources and enforces document-level ACLs automatically.&lt;/p&gt;

&lt;p&gt;No custom orchestration code required.&lt;/p&gt;

&lt;p&gt;Covers enterprise knowledge search, policy Q&amp;amp;A, and any pattern where the core job is “find the right document and synthesize an answer the employee is authorized to see.”&lt;/p&gt;

&lt;p&gt;When the same workflow also needs to take actions (create tickets, provision access, call APIs), pair Q Business for retrieval with AgentCore for execution.&lt;/p&gt;




&lt;h1&gt;
  
  
  The 25 Use Cases
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Knowledge Management and Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #026 - Enterprise Knowledge Search Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Amazon Q Business (primary), AgentCore (optional action layer) &lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; G&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Searches across internal knowledge sources - Confluence, SharePoint, Google Drive, Slack message history, Jira, and S3 - through a single conversational interface.&lt;/li&gt;
&lt;li&gt;Understands natural language questions (”What’s our policy on vendor security reviews?”), retrieves relevant documents from multiple sources, synthesizes a direct answer with citations, and identifies when conflicting information exists across sources.&lt;/li&gt;
&lt;li&gt;Respects document-level permissions so employees only see content they have access to. Amazon Q Business handles this natively: its built-in connectors index these sources and its ACL engine maps existing permissions without custom code.&lt;/li&gt;
&lt;li&gt;For sources Q Business does not cover natively, Bedrock Knowledge Bases with a custom data source connector fills the gap, though note that some Bedrock connectors (such as Confluence) are in preview and do not yet support multimodal content like tables and diagrams.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Amazon Q Business (connectors + retriever + ACL engine), Bedrock Knowledge Bases (custom RAG for unsupported sources), S3 (document store)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your employees regularly say “I know we documented this somewhere” and spend 20+ minutes searching across 3 or more knowledge platforms.&lt;/p&gt;




&lt;h3&gt;
  
  
  #027 - Policy and Compliance Q&amp;amp;A Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Amazon Q Business (primary), AgentCore (for action routing)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; G&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answers employee questions about internal policies - travel expenses, PTO accrual, data classification, security requirements, procurement thresholds, acceptable use.&lt;/li&gt;
&lt;li&gt;Pulls from the authoritative policy documents (not outdated wiki copies) and provides specific answers with page references.&lt;/li&gt;
&lt;li&gt;Q Business indexes the policy corpus from S3 or SharePoint and enforces access controls so employees only see policies relevant to their role.&lt;/li&gt;
&lt;li&gt;When policies are ambiguous or the question falls outside documented rules, an AgentCore action layer identifies the policy owner and drafts an email for the employee to send.&lt;/li&gt;
&lt;li&gt;Tracks which policies generate the most questions, surfacing candidates for clarification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Amazon Q Business (retriever + ACL engine), S3 (policy document store), AgentCore Runtime (action routing), CloudWatch (query analytics)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your HR, legal, or compliance team answers the same policy questions repeatedly, and employees default to asking coworkers instead of reading the docs.&lt;/p&gt;




&lt;h3&gt;
  
  
  #028 - Institutional Knowledge Capture Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs structured knowledge extraction interviews with subject matter experts, particularly before role transitions, departures, or reorganizations.&lt;/li&gt;
&lt;li&gt;Asks targeted questions about undocumented processes, tribal knowledge, key relationships, and decision context.&lt;/li&gt;
&lt;li&gt;Transcribes and synthesizes responses into structured knowledge articles with proper metadata and cross-references.&lt;/li&gt;
&lt;li&gt;Identifies gaps where captured knowledge contradicts or supplements existing documentation.&lt;/li&gt;
&lt;li&gt;Generates a handoff document for successors.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory (interview state), Amazon Transcribe, S3 (knowledge archive), Bedrock Knowledge Bases&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Critical knowledge walks out the door when senior employees leave, and your team spends months reconstructing context that lived in someone’s head.&lt;/p&gt;




&lt;h3&gt;
  
  
  #029 - Technical Documentation Assistant
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helps engineers navigate internal API documentation, runbooks, architecture decision records, and system diagrams.&lt;/li&gt;
&lt;li&gt;Answers questions like “How does the payment service authenticate with the ledger?” by pulling from code comments, README files, ADRs, and internal docs.&lt;/li&gt;
&lt;li&gt;When documentation is stale or missing, it flags the gap and creates a draft based on the current codebase.&lt;/li&gt;
&lt;li&gt;Understands code context so it can explain what a service does, not just repeat what the docs say.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (documentation + custom-ingested code artifacts), Amazon Q Developer (native repository integration)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your engineering team wastes hours reading outdated documentation and reverse-engineering service behavior because the docs do not match the code.&lt;/p&gt;




&lt;h3&gt;
  
  
  #030 - Cross-Team Decision Log Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Both (AgentCore backend + Quick analytics)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D + E&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Captures architectural decisions, trade-off discussions, and design choices from Slack threads, meeting transcripts, and PR comments.&lt;/li&gt;
&lt;li&gt;Structures them into searchable decision records with context, alternatives considered, rationale, and stakeholders.&lt;/li&gt;
&lt;li&gt;When a team proposes something that contradicts or revisits a prior decision, the agent surfaces the original discussion and reasoning.&lt;/li&gt;
&lt;li&gt;Quick dashboards show decision frequency by domain, open questions, and areas where decisions are overdue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory, Amazon Quick (QuickSight + Index), Amazon Transcribe, S3 (decision archive)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your teams relitigate the same technical decisions every quarter because nobody remembers why the original choice was made.&lt;/p&gt;




&lt;h2&gt;
  
  
  IT Help Desk and Internal Support
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #031 - IT Help Desk Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles common IT support requests through Slack or a web interface.&lt;/li&gt;
&lt;li&gt;Resets passwords via the IdP API, provisions software licenses through the asset management system.&lt;/li&gt;
&lt;li&gt;Troubleshoots VPN connectivity with diagnostic checks.&lt;/li&gt;
&lt;li&gt;Resolves printer issues with guided walkthroughs, and manages MFA token enrollment.&lt;/li&gt;
&lt;li&gt;For issues requiring hands-on support, it collects diagnostic information, determines priority based on impact and urgency, and creates a ticket with all relevant context pre-populated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, AgentCore Identity (IdP integration), AgentCore Gateway (ITSM APIs), ServiceNow API, Okta/Azure AD API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; More than 50% of your IT help desk tickets are password resets, access requests, and connectivity issues that follow standard resolution procedures.&lt;/p&gt;




&lt;h3&gt;
  
  
  #032 - Software Access Provisioning Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes software access requests end-to-end. Employee asks for access to a tool (GitHub org, AWS account, Datadog, Salesforce).&lt;/li&gt;
&lt;li&gt;The agent checks the employee’s role against the entitlement matrix, identifies whether manager approval is needed, routes the approval request, and upon approval, provisions access via the tool’s API or SCIM endpoint.&lt;/li&gt;
&lt;li&gt;Handles license availability checks and waitlisting.&lt;/li&gt;
&lt;li&gt;Automatically de-provisions access when employees change roles or depart based on HRIS events.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, AgentCore Policy (entitlement rules), AgentCore Identity, SCIM APIs, HRIS API (Workday/BambooHR), EventBridge (lifecycle events)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Software access requests take 2+ business days to fulfill because they require manual approval chains and admin intervention across multiple systems.&lt;/p&gt;




&lt;h3&gt;
  
  
  #033 - Incident Communication Coordinator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;During production incidents, drafts and distributes internal status updates based on real-time information from monitoring tools and the incident Slack channel.&lt;/li&gt;
&lt;li&gt;Pulls metrics from CloudWatch and Datadog, summarizes the current state of the incident, identifies affected services and customer impact, and posts updates to the status page and stakeholder channels at configured intervals.&lt;/li&gt;
&lt;li&gt;After resolution, compiles a timeline of events and generates a postmortem draft with contributing factors and action items pre-populated from the incident channel discussion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (monitoring APIs), CloudWatch, EventBridge, SNS (notifications), S3 (postmortem archive)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your incident commanders spend more time writing status updates than resolving the incident, and postmortems take a week to produce because nobody captured the timeline in real-time.&lt;/p&gt;




&lt;h3&gt;
  
  
  #034 - Infrastructure Self-Service Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lets developers request and configure cloud infrastructure through conversation instead of filing tickets.&lt;/li&gt;
&lt;li&gt;Handles common requests: spin up a dev environment, create an S3 bucket with standard tagging, set up a new RDS instance within approved configurations, or request a temporary IAM role for cross-account access.&lt;/li&gt;
&lt;li&gt;Validates all requests against organizational policies and guardrails (naming conventions, cost limits, security baselines) before executing via IaC templates.&lt;/li&gt;
&lt;li&gt;Non-standard requests route to the platform team with a pre-filled request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (guardrails), AWS Service Catalog, CloudFormation/CDK, IAM, AWS Organizations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your platform team processes 30+ infrastructure requests per week and developers wait 1-3 days for standard environments that could be provisioned in minutes.&lt;/p&gt;




&lt;h3&gt;
  
  
  #035 - Security Questionnaire Response Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Completes vendor security questionnaires and customer security assessments by matching questions against a maintained library of approved responses.&lt;/li&gt;
&lt;li&gt;Pulls from SOC 2 reports, penetration test summaries, architecture documentation, and previously approved answers.&lt;/li&gt;
&lt;li&gt;Drafts responses for each question with confidence scores.&lt;/li&gt;
&lt;li&gt;High-confidence answers (exact matches to prior approved responses) are auto-filled.&lt;/li&gt;
&lt;li&gt;Low-confidence answers are flagged for security team review.&lt;/li&gt;
&lt;li&gt;Tracks which questions appear most frequently to prioritize documentation improvements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (security response library, optionally backed by OpenSearch Serverless for advanced retrieval control), S3 (compliance documents)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your security team spends 10+ hours per week completing repetitive security questionnaires, and the same questions appear across 80% of inbound assessments.&lt;/p&gt;




&lt;h2&gt;
  
  
  HR and People Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #036 - Employee Onboarding Navigator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guides new hires through their first 90 days.&lt;/li&gt;
&lt;li&gt;Sends day-one setup instructions (laptop configuration, tool access, building entry).&lt;/li&gt;
&lt;li&gt;Answers questions about benefits enrollment deadlines, org structure, team norms, and internal processes.&lt;/li&gt;
&lt;li&gt;Adapts the onboarding checklist based on role, department, and location.&lt;/li&gt;
&lt;li&gt;Tracks completion of required training, compliance acknowledgments, and documentation reviews.&lt;/li&gt;
&lt;li&gt;Nudges managers when their new hire’s onboarding milestones are stalling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory (onboarding state), HRIS API (Workday/BambooHR), LMS API, SES/SNS (notifications)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; New hire ramp time exceeds 30 days, onboarding satisfaction scores are below 80%, and your HR team manually tracks checklist completion in spreadsheets.&lt;/p&gt;




&lt;h3&gt;
  
  
  #037 - Benefits and Leave Advisory Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answers employee questions about health insurance plans, 401(k) matching, HSA/FSA eligibility, parental leave, PTO balance, and FMLA procedures.&lt;/li&gt;
&lt;li&gt;Pulls real-time data from the HRIS and benefits platforms to give personalized answers (”You have 8.5 PTO days remaining this year”).&lt;/li&gt;
&lt;li&gt;Walks employees through benefits enrollment during open enrollment with side-by-side plan comparisons based on their specific situation (family size, expected medical usage, contribution preferences).&lt;/li&gt;
&lt;li&gt;Routes complex cases to HR specialists with the question and relevant context pre-attached.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Identity (employee verification), HRIS API, benefits platform API, Bedrock Guardrails (PII handling)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your HR inbox is dominated by benefits questions during open enrollment, and employees make suboptimal plan selections because they do not understand their options.&lt;/p&gt;




&lt;h3&gt;
  
  
  #038 - Internal Job Matching Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Matches employees to internal open positions based on skills, career goals, project history, and performance data.&lt;/li&gt;
&lt;li&gt;Goes beyond keyword matching on job descriptions: analyzes the employee’s actual work (code contributions, project involvement, skills demonstrated in reviews) against what the hiring manager needs.&lt;/li&gt;
&lt;li&gt;Surfaces opportunities employees might not have found or considered.&lt;/li&gt;
&lt;li&gt;Provides a match explanation (”Your work on the data pipeline migration maps directly to this team’s real-time analytics build”).&lt;/li&gt;
&lt;li&gt;Respects confidentiality so managers are not notified unless the employee applies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (confidentiality rules), HRIS API, ATS API (Greenhouse/Lever), Bedrock Knowledge Bases (job postings + employee profiles)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Internal mobility is below 15%, employees leave for roles they could have found internally, and your job board gets low engagement because listings read like external postings.&lt;/p&gt;




&lt;h3&gt;
  
  
  #039 - Performance Review Preparation Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helps managers prepare for performance reviews by compiling an employee’s contributions over the review period.&lt;/li&gt;
&lt;li&gt;Pulls data from project management tools (Jira tickets completed, PRs merged, epics delivered), peer feedback, 1:1 notes, goal tracking systems, and prior review history.&lt;/li&gt;
&lt;li&gt;Generates a structured draft highlighting key accomplishments, growth areas, and evidence for each.&lt;/li&gt;
&lt;li&gt;Does not write the evaluation - it assembles the evidence so the manager spends time on assessment quality instead of data gathering.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (data access controls), Jira API, GitHub API, HRIS API, 15Five/Lattice API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your managers spend 3+ hours per direct report gathering data for reviews, and review quality suffers because managers rely on recency bias instead of full-period evidence.&lt;/p&gt;




&lt;h3&gt;
  
  
  #040 - Compensation Benchmarking Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Both (AgentCore backend + Quick analytics)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Foundation Build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D + E&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Helps HR and hiring managers make compensation decisions by pulling from internal pay bands, market survey data, and peer comparisons.&lt;/li&gt;
&lt;li&gt;Takes a role, level, location, and candidate profile, then generates a recommended offer range with supporting data.&lt;/li&gt;
&lt;li&gt;Flags when a proposed offer falls outside band or creates internal equity concerns.&lt;/li&gt;
&lt;li&gt;Quick dashboards show compensation distribution by team, gender pay gap analysis, and market competitiveness by role family.&lt;/li&gt;
&lt;li&gt;All outputs route through HR approval before reaching the hiring manager.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (data access restrictions), Amazon Quick (QuickSight + Research), HRIS API, compensation survey APIs, Redshift&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Compensation decisions take a week because they require HR to manually pull market data, check internal equity, and build a justification for every offer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Engineering and Development
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #041 - Code Review Context Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enriches pull requests with context that speeds up code review.&lt;/li&gt;
&lt;li&gt;When a PR is opened, it analyzes the changes and adds a summary: which services are affected, what architectural patterns changed, whether the change touches a critical path, and links to related PRs and design docs.&lt;/li&gt;
&lt;li&gt;Flags potential issues: breaking API changes, missing test coverage for modified paths, configuration changes that affect other teams, and dependency updates with known vulnerabilities.&lt;/li&gt;
&lt;li&gt;Does not approve or block - it surfaces what a reviewer should pay attention to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, GitHub/GitLab API, Bedrock Knowledge Bases (architecture docs + ADRs), Amazon Q Developer (code review context)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Code reviews take 2+ days because reviewers spend most of their time understanding context rather than evaluating the actual change.&lt;/p&gt;




&lt;h3&gt;
  
  
  #042 - Incident Postmortem Generator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Produces structured postmortem documents from incident data.&lt;/li&gt;
&lt;li&gt;Pulls the timeline from PagerDuty or Opsgenie, reconstructs the sequence of events from the incident Slack channel, correlates with deployment logs and monitoring data, and generates a draft postmortem following your team’s template.&lt;/li&gt;
&lt;li&gt;Identifies contributing factors by analyzing what changed before the incident (deploys, config changes, traffic spikes).&lt;/li&gt;
&lt;li&gt;Pre-populates action items based on patterns from previous incidents.&lt;/li&gt;
&lt;li&gt;The on-call engineer reviews and refines instead of writing from scratch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (PagerDuty/Opsgenie API, Slack API), CloudWatch Logs, S3 (postmortem archive)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Postmortems take a week to produce, half of incidents never get a written postmortem, and your team keeps encountering the same failure modes.&lt;/p&gt;




&lt;h3&gt;
  
  
  #043 - Dependency Risk Assessment Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously monitors your codebase’s dependency tree for risk signals beyond CVEs.&lt;/li&gt;
&lt;li&gt;Analyzes maintainer activity (abandoned projects, single-maintainer risk), license compatibility, breaking change frequency in upstream releases, and supply chain indicators (typosquatting packages, unexpected maintainer changes).&lt;/li&gt;
&lt;li&gt;When a dependency update is available, provides a risk assessment: what changed, what might break, and whether similar codebases have reported issues.&lt;/li&gt;
&lt;li&gt;Prioritizes updates based on actual exposure, not just severity scores.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (GitHub API, package registry APIs), Amazon Inspector (vulnerability scanning + SCA), Amazon Q Developer (code-level risk context), EventBridge (scheduled scans)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your dependency updates are either ignored for months (creating security debt) or applied blindly (causing unexpected breakages), and Dependabot alerts alone do not give you enough context to prioritize.&lt;/p&gt;




&lt;h3&gt;
  
  
  #044 - On-Call Handoff Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates end-of-rotation handoff briefs for on-call engineers.&lt;/li&gt;
&lt;li&gt;Compiles all incidents from the rotation (alerts fired, pages received, resolutions applied), ongoing issues that need monitoring, recent deployments that might cause problems, and upcoming changes the next on-call should watch.&lt;/li&gt;
&lt;li&gt;Pulls from PagerDuty, Slack incident channels, deployment logs, and the change calendar.&lt;/li&gt;
&lt;li&gt;The outgoing on-call reviews and annotates the brief before it goes to the incoming engineer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, PagerDuty API, Slack API, deployment pipeline API, SES (handoff delivery)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; On-call handoffs happen verbally (or not at all), incoming engineers start blind, and the first hour of every rotation is spent asking “what happened this week?”&lt;/p&gt;




&lt;h3&gt;
  
  
  #045 - Architecture Decision Record Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Facilitates the creation of Architecture Decision Records from design discussions.&lt;/li&gt;
&lt;li&gt;Monitors designated Slack channels and meeting transcripts for architectural debates.&lt;/li&gt;
&lt;li&gt;When it detects a decision being made, it drafts an ADR: context, decision, alternatives considered, consequences, and status.&lt;/li&gt;
&lt;li&gt;Tags the relevant teams and stakeholders for review.&lt;/li&gt;
&lt;li&gt;Maintains a searchable index of all ADRs linked to the services they affect.&lt;/li&gt;
&lt;li&gt;When someone proposes a change that conflicts with an existing ADR, the agent surfaces the relevant record and asks whether this is an intentional reversal.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory, Slack API, Amazon Transcribe, Bedrock Knowledge Bases (ADR corpus), S3&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your team makes architectural decisions in Slack threads that nobody can find three months later, and new engineers re-propose approaches that were already evaluated and rejected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finance and Procurement
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #046 - Expense Report Processing Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes expense reports by extracting data from uploaded receipts using Amazon Textract, matching expenses against the company’s travel and expense policy, flagging out-of-policy items with specific policy references, and routing compliant reports for manager approval.&lt;/li&gt;
&lt;li&gt;Handles currency conversion for international expenses, per diem calculations by city, and mileage reimbursement.&lt;/li&gt;
&lt;li&gt;Auto-categorizes expenses for GL coding.&lt;/li&gt;
&lt;li&gt;Reports with flagged items go to the submitter for correction before reaching the approval queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, Amazon Textract, AgentCore Policy (expense rules), expense management API (Concur/Expensify), DynamoDB&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your finance team manually reviews expense reports for policy compliance, processing takes 5+ business days, and 30% of submissions require back-and-forth corrections.&lt;/p&gt;




&lt;h3&gt;
  
  
  #047 - Procurement Request Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guides employees through procurement requests conversationally.&lt;/li&gt;
&lt;li&gt;Collects requirements (what they need, why, budget, timeline), checks whether an existing contract covers the request, identifies the correct approval chain based on amount and category, and generates a purchase requisition.&lt;/li&gt;
&lt;li&gt;For software purchases, checks the approved vendor list and existing license inventory to avoid redundant buying.&lt;/li&gt;
&lt;li&gt;Handles the approval workflow: routes to the right approvers, sends reminders, escalates stalled approvals, and notifies the requester at each stage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (approval rules + spend limits), ERP API (SAP/Oracle/NetSuite), contract management API, SES (notifications)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Employees avoid the procurement process because it requires filling out forms they do not understand, and your procurement team spends hours routing requests to the right approvers.&lt;/p&gt;




&lt;h3&gt;
  
  
  #048 - Budget Tracking and Forecast Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build Platform: Both (AgentCore backend + Quick dashboards) &lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D + E&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors department budgets against actuals in real-time.&lt;/li&gt;
&lt;li&gt;Pulls spend data from the ERP, cloud billing (AWS Cost Explorer), and SaaS management platforms. Alerts budget owners when spending trends suggest they will exceed budget before quarter end.&lt;/li&gt;
&lt;li&gt;Generates variance explanations by analyzing which line items are over or under plan.&lt;/li&gt;
&lt;li&gt;Quick dashboards let managers drill into spend by category, vendor, and project.&lt;/li&gt;
&lt;li&gt;Produces monthly budget summaries and forecast adjustments automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Quick (QuickSight + Flows), AWS Cost Explorer API, ERP API, Redshift, EventBridge (alerting triggers), SNS&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Budget reviews happen monthly from stale spreadsheets, overspend is discovered after the fact, and finance produces variance reports manually.&lt;/p&gt;




&lt;h2&gt;
  
  
  Meetings and Communication
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #049 - Meeting Summarization and Action Tracker
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Joins meetings (via Amazon Chime SDK or calendar integration), transcribes the discussion, and produces a structured summary within minutes of the meeting ending.&lt;/li&gt;
&lt;li&gt;Identifies decisions made, action items with owners and due dates, open questions, and topics deferred.&lt;/li&gt;
&lt;li&gt;Posts the summary to the relevant Slack channel or project management tool.&lt;/li&gt;
&lt;li&gt;Tracks action items across meetings and flags overdue items in the next meeting’s pre-brief.&lt;/li&gt;
&lt;li&gt;Distinguishes between informational discussion and actionable outcomes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Transcribe, Amazon Chime SDK (the SDK remains supported independently of the Chime service), Slack API, Jira API (action item creation), S3 (transcript archive)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Action items from meetings disappear into notes nobody reads, decisions get relitigated because they were not recorded, and your team spends 5+ hours per week in meetings without clear outcomes.&lt;/p&gt;




&lt;h3&gt;
  
  
  #050 - Status Report Generator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; D&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compiles weekly or biweekly status reports by pulling from the systems where work actually happens.&lt;/li&gt;
&lt;li&gt;Aggregates Jira ticket progress, GitHub PR activity, deployment history, incident reports, and OKR tracking data.&lt;/li&gt;
&lt;li&gt;Produces a structured update for each team: what shipped, what is in progress, what is blocked, and key metrics.&lt;/li&gt;
&lt;li&gt;Managers review and edit instead of writing from scratch.&lt;/li&gt;
&lt;li&gt;Adapts format and detail level based on the audience (team standup vs executive briefing vs cross-functional update).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (Jira, GitHub, OKR platform APIs), S3 (report archive), SES (distribution)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your managers spend 2+ hours per week writing status reports by manually checking Jira, GitHub, and Slack, and the reports are outdated by the time they are sent.&lt;/p&gt;




&lt;h2&gt;
  
  
  What These 25 Patterns Reveal
&lt;/h2&gt;

&lt;p&gt;Different dynamics emerge when agents face inward instead of outward.&lt;/p&gt;

&lt;p&gt;Knowledge retrieval dominates the Quick Win category.&lt;/p&gt;

&lt;p&gt;Most of them involve finding, synthesizing, or delivering information that already exists somewhere in the organization.&lt;/p&gt;

&lt;p&gt;The hardest part of internal AI agents is not the reasoning - it is the integration with fragmented knowledge sources behind SSO, document-level permissions, and inconsistent APIs.&lt;/p&gt;

&lt;p&gt;Amazon Q Business absorbs a significant chunk of this complexity out of the box with native connectors and built-in ACLs, which is why it appears as the default for pure retrieval patterns.&lt;/p&gt;

&lt;p&gt;Bedrock Knowledge Bases fills in when you need a custom RAG pipeline or when Q Business lacks a connector for your source.&lt;/p&gt;

&lt;p&gt;Permission models are the real engineering challenge.&lt;/p&gt;

&lt;p&gt;Customer-facing agents from Edition 1 mostly deal with one customer’s data at a time.&lt;/p&gt;

&lt;p&gt;Internal agents cross organizational boundaries constantly.&lt;/p&gt;

&lt;p&gt;An HR agent that can see compensation data, a finance agent that reads budget forecasts, an engineering agent that accesses production logs - each needs fine-grained access controls scoped to the requester’s role.&lt;/p&gt;

&lt;p&gt;AgentCore Identity handles IdP integration for SSO. AgentCore Policy adds rule-based access scoping - verify maturity for your target region before production rollout.&lt;/p&gt;

&lt;p&gt;For retrieval-only patterns, Q Business’s ACL engine is the more battle-tested option today.&lt;/p&gt;

&lt;p&gt;RPA migrations have the clearest ROI.&lt;/p&gt;

&lt;p&gt;Expense processing, access provisioning, procurement workflows - these agents replace brittle RPA scripts that break when a UI changes.&lt;/p&gt;

&lt;p&gt;The agentic version handles exceptions, asks clarifying questions, and adapts to edge cases instead of failing silently.&lt;/p&gt;

&lt;p&gt;Multi-agent architectures appear less often internally (see how we did not reference architecture F).&lt;/p&gt;

&lt;p&gt;Internal users tolerate slightly longer response times and are better at framing specific questions, which means a single well-tooled agent handles most internal scenarios effectively.&lt;/p&gt;

&lt;p&gt;Quick fills the analytics gap. Some patterns use Quick for dashboarding and self-service analysis.&lt;/p&gt;

&lt;p&gt;Internal teams need visibility into operational data more than they need conversational agents.&lt;/p&gt;

&lt;p&gt;QuickSight and Quick Research provide that without custom development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Leverage Actually Is
&lt;/h2&gt;

&lt;p&gt;Most of the patterns in this edition run on a single agent with tool access. That’s not a limitation of the framework, it reflects how internal work actually breaks down. Employees ask specific questions, need specific actions, and want specific answers. The architectural complexity lives in the permission model and the integration layer, not in multi-agent orchestration.&lt;/p&gt;

&lt;p&gt;The engineer from the opening spent 40 minutes finding a rate-limiting policy. Pattern #026 solves that with Q Business, native connectors, and document-level ACLs she never has to think about.&lt;/p&gt;

&lt;p&gt;No custom orchestration.&lt;/p&gt;

&lt;p&gt;No agent memory.&lt;/p&gt;

&lt;p&gt;No specialist routing.&lt;/p&gt;

&lt;p&gt;The right document, surfaced to someone authorized to see it, in seconds. Start there.&lt;/p&gt;

&lt;p&gt;Add AgentCore when the workflow needs to take action, not just answer questions. Add Quick when teams need dashboards, not conversations.&lt;/p&gt;

&lt;p&gt;Every pattern in this edition follows that same decision sequence: retrieval first, action second, analytics where the data justifies it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;Three more editions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edition 3&lt;/strong&gt; - Workflow automation and process agents (internal operations, no direct user interaction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edition 4&lt;/strong&gt; - Data and analytics agents (self-service BI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edition 5&lt;/strong&gt; - Compliance, security, and governance agents (high-stakes environments)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building internal productivity agents, start with #026 (Enterprise Knowledge Search) or #031 (IT Help Desk).&lt;/p&gt;

&lt;p&gt;Enterprise Knowledge Search deploys fast on Q Business with minimal custom code.&lt;/p&gt;

&lt;p&gt;IT Help Desk needs AgentCore for the action layer but has the clearest success metrics.&lt;/p&gt;

&lt;p&gt;Both solve a pain point every employee recognizes on day one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I publish every week at &lt;a href="https://buildwithaws.substack.com" rel="noopener noreferrer"&gt;buildwithaws.substack.com&lt;/a&gt;. Subscribe. It's free.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Stop Designing AI Agents From Scratch. Steal These 25 Patterns Instead.</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Thu, 02 Apr 2026 13:53:55 +0000</pubDate>
      <link>https://dev.to/aws-builders/stop-designing-ai-agents-from-scratch-steal-these-25-patterns-instead-3h5p</link>
      <guid>https://dev.to/aws-builders/stop-designing-ai-agents-from-scratch-steal-these-25-patterns-instead-3h5p</guid>
      <description>&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!9nwP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9ae6ea9-3b67-4c27-9127-36a54f43b127_1408x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8u82b9xxjpmgmto3ijwt.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
Originally published on &lt;a href="https://buildwithaws.substack.com/p/stop-designing-ai-agents-from-scratch" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/p&gt;

&lt;p&gt;A customer support team deployed a Bedrock-powered chatbot last quarter.&lt;/p&gt;

&lt;p&gt;It answered questions from a knowledge base, handled basic FAQs, and saved about 15 hours per week.&lt;/p&gt;

&lt;p&gt;Thanks for reading Build With AWS! Subscribe for free to receive new posts and support my work.&lt;/p&gt;

&lt;p&gt;Solid win.&lt;/p&gt;

&lt;p&gt;Then someone asked:&lt;/p&gt;

&lt;p&gt;“Can it also check order status, issue refunds, and escalate to the right team based on sentiment?”&lt;/p&gt;

&lt;p&gt;That question marks the exact boundary between a GenAI feature and an AI agent.&lt;/p&gt;

&lt;p&gt;This is the first edition of a five-part series cataloging real AI architecture patterns running on AWS right now.&lt;/p&gt;

&lt;p&gt;Each edition covers 20-25 use cases with enough detail to evaluate whether they fit your organization: what the agent does, which services power it, and a reference architecture you can adapt.&lt;/p&gt;

&lt;p&gt;Patterns you can take to your next architecture review, not slides about the future of AI.&lt;/p&gt;

&lt;p&gt;But first, two quick mental models so the cards land with full context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent or Not? Five Questions
&lt;/h2&gt;

&lt;p&gt;Every AI project starts with someone saying “we should build an agent for that.”&lt;/p&gt;

&lt;p&gt;Most of the time, a well-configured prompt with RAG handles the job.&lt;/p&gt;

&lt;p&gt;The distinction matters because agents cost more to build, run, and debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  How predictable is the workflow?
&lt;/h3&gt;

&lt;p&gt;Same steps, same order, every time?&lt;/p&gt;

&lt;p&gt;A Lambda function with a Bedrock call handles it.&lt;/p&gt;

&lt;p&gt;Agents earn their keep when each request requires different steps based on context.&lt;/p&gt;

&lt;p&gt;A refund request that needs to check inventory, verify purchase history, calculate partial credit, and decide whether to escalate, all conditionally, that is agent territory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it require multi-step reasoning?
&lt;/h3&gt;

&lt;p&gt;Single-turn Q&amp;amp;A works fine as a RAG pipeline.&lt;/p&gt;

&lt;p&gt;When the system needs to analyze options, weigh trade-offs, decide, and then act on that decision across multiple systems, you need agentic reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it need tool access?
&lt;/h3&gt;

&lt;p&gt;Reading from a knowledge base and generating text is retrieval-augmented generation.&lt;/p&gt;

&lt;p&gt;Calling APIs, writing to databases, triggering workflows, interacting with external systems, that requires an agent’s orchestration layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it interact conversationally?
&lt;/h3&gt;

&lt;p&gt;Multi-turn dialogue with context retention, clarifying questions, and adaptive responses points toward agentic design.&lt;/p&gt;

&lt;p&gt;Form-style inputs do not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it need to improve over time?
&lt;/h3&gt;

&lt;p&gt;Static systems return the same quality output indefinitely.&lt;/p&gt;

&lt;p&gt;Agents that learn from feedback and adapt to new scenarios justify the additional infrastructure.&lt;/p&gt;

&lt;p&gt;Score each 1-5.&lt;/p&gt;

&lt;p&gt;Below 10? Standard GenAI.&lt;/p&gt;

&lt;p&gt;Between 10 and 18? Evaluate whether basic GenAI plus automation gets you 80% of the value at 30% of the cost.&lt;/p&gt;

&lt;p&gt;Above 18? Build the agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  AgentCore vs Quick in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AgentCore&lt;/strong&gt; is the developer platform. Modular services (Runtime, Gateway, Memory, Identity, Policy, Observability) you compose into custom architectures.&lt;/p&gt;

&lt;p&gt;You write code, pick your framework (LangGraph, CrewAI, Strands), and control everything.&lt;/p&gt;

&lt;p&gt;Best for custom agent logic, multi-agent orchestration, VPC-internal integrations, and fine-grained security scoping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Quick&lt;/strong&gt; is the business user platform.&lt;/p&gt;

&lt;p&gt;Five pre-built products:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick Sight (visualization)&lt;/li&gt;
&lt;li&gt;Quick Flows (workflow automation)&lt;/li&gt;
&lt;li&gt;Quick Automate (process automation)&lt;/li&gt;
&lt;li&gt;Quick Index (enterprise search)&lt;/li&gt;
&lt;li&gt;Quick Research (deep analysis).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best for data analysis, report generation, document search, and SaaS integrations where speed to deployment beats architectural control.&lt;/p&gt;

&lt;p&gt;Some patterns in this series use both.&lt;/p&gt;

&lt;p&gt;An AgentCore agent handles backend orchestration while Quick provides the analytics layer.&lt;/p&gt;

&lt;p&gt;They complement each other.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Read the Cards
&lt;/h2&gt;

&lt;p&gt;Every use case follows this structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern&lt;/strong&gt; - Where this agent comes from in your org: net-new capability, upgrade from an existing chatbot, or replacement for an RPA workflow. This describes the migration path, not the audience - all 25 patterns in this edition are customer-facing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform&lt;/strong&gt; - AgentCore, Quick, or both&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt; - Quick Win (weeks, high confidence), Strategic Bet (months, higher value), or Foundation Build (prerequisites needed first)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference Architecture&lt;/strong&gt; - Points to one of the three diagrams below&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What the agent does&lt;/strong&gt; - The actual workflow, triggers, systems, decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS services&lt;/strong&gt; - Specific services involved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need this if&lt;/strong&gt; - One signal that this use case applies to you&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Reference Architecture A - Single Agent with Tool Access
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!fhTl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d68da91-db93-450a-b62a-c3e6aba68ea8_1408x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawtfduiw4bvn3z8iv6zt.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; The agent reasons about which tools to call, in what order, based on customer context. One agent handles the full conversation with 3-8 tools.&lt;/p&gt;

&lt;p&gt;Covers most customer service, sales, and account management agents.&lt;/p&gt;

&lt;p&gt;The Gateway handles tool discovery, authentication, and rate limiting.&lt;/p&gt;

&lt;p&gt;The Runtime manages session state.&lt;/p&gt;

&lt;p&gt;Bedrock provides the foundation models used for reasoning and generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture B - Quick Workspace for Customer Intelligence
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!PG-u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50bae0c2-8590-4d56-a29c-6aafd598c0cd_1408x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstuxd4goq0r279pwbi3y.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Quick&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Internal teams need AI-powered analysis of customer data, behavior patterns, or support metrics without writing code.&lt;/p&gt;

&lt;p&gt;Covers customer analytics, churn prediction dashboards, support quality monitoring, and self-service reporting for customer success teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Architecture C - Multi-Agent Customer Workflow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://substackcdn.com/image/fetch/$s_!Xe0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fa04fd-4bf0-475c-9f3d-467d9993acaf_1408x768.jpeg" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26zuqy5dwm5ghwmj2a94.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore (multi-agent)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Different customer intents need different tools, knowledge bases, and reasoning patterns.&lt;/p&gt;

&lt;p&gt;A single agent with 20+ tools becomes unreliable.&lt;/p&gt;

&lt;p&gt;Specialized agents with a router perform better.&lt;/p&gt;




&lt;h1&gt;
  
  
  The 25 Use Cases
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Customer Support and Service
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #001 - Intelligent Ticket Resolution Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receives incoming support tickets via API or chat widget.&lt;/li&gt;
&lt;li&gt;Pulls customer history from CRM, checks recent orders and transactions, searches the knowledge base for relevant solutions, and either resolves the ticket directly or drafts a response for human review.&lt;/li&gt;
&lt;li&gt;Handles password resets, order status checks, return initiations, and FAQ-level questions autonomously.&lt;/li&gt;
&lt;li&gt;Escalates to human agents when confidence drops below a configured threshold.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, OpenSearch Serverless (knowledge base), Bedrock Guardrails&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your support team spends more than 40% of their time on repetitive tickets that follow predictable resolution patterns.&lt;/p&gt;




&lt;h3&gt;
  
  
  #002 - Multi-Channel Support Orchestrator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore (multi-agent)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; C&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A router agent receives customer messages from chat, email, voice (transcribed via Amazon Transcribe), and social channels.&lt;/li&gt;
&lt;li&gt;It classifies intent, detects sentiment, pulls conversation history from AgentCore Memory, and routes to a specialist agent.&lt;/li&gt;
&lt;li&gt;The billing agent handles payment disputes and invoice questions.&lt;/li&gt;
&lt;li&gt;The technical agent troubleshoots product issues with access to diagnostic APIs.&lt;/li&gt;
&lt;li&gt;The account agent manages subscription changes.&lt;/li&gt;
&lt;li&gt;Each specialist has its own tool set and knowledge base, keeping context windows focused and tool selection reliable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory, AgentCore Gateway, Amazon Transcribe, Amazon Connect, EventBridge&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; You support customers across 3+ channels and your agents need different tools for billing, technical, and account questions.&lt;/p&gt;




&lt;h3&gt;
  
  
  #003 - Proactive Customer Health Monitor
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Both (AgentCore backend + Quick analytics)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet Reference&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; A + B&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs on a schedule (daily or triggered by events).&lt;/li&gt;
&lt;li&gt;Analyzes customer usage patterns, support ticket frequency, NPS scores, and billing data.&lt;/li&gt;
&lt;li&gt;Identifies accounts showing early churn signals: declining usage, increasing ticket volume, missed payments, or negative sentiment trends.&lt;/li&gt;
&lt;li&gt;Generates a risk score and recommended intervention for each flagged account. Customer success managers review the output through Quick Sight dashboards and receive alerts via SNS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Quick Sight, EventBridge (scheduler), Redshift (customer data warehouse), SNS&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your customer success team manages 100+ accounts and discovers churn risk reactively, after the customer complains or leaves.&lt;/p&gt;




&lt;h3&gt;
  
  
  #004 - Returns and Refund Processing Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles return requests end-to-end.&lt;/li&gt;
&lt;li&gt;Verifies purchase eligibility against return policy rules, checks inventory status for exchanges, calculates refund amounts (including partial refunds, restocking fees, and promotional adjustments), initiates the refund through the payment gateway API, generates return shipping labels, and sends confirmation to the customer.&lt;/li&gt;
&lt;li&gt;For edge cases outside policy parameters, it drafts a recommendation and routes to a human supervisor.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, AgentCore Gateway, AgentCore Policy (action restrictions), payment gateway API, shipping API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Returns processing involves manual lookups across 3+ systems and takes your team more than 10 minutes per request on average.&lt;/p&gt;




&lt;h3&gt;
  
  
  #005 - Warranty Claims Adjudication Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Foundation Build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Receives warranty claims with product photos, purchase receipts, and damage descriptions.&lt;/li&gt;
&lt;li&gt;Uses multimodal Bedrock models to analyze product images and assess damage.&lt;/li&gt;
&lt;li&gt;Cross-references the serial number against the warranty database for coverage verification.&lt;/li&gt;
&lt;li&gt;Applies claim rules (coverage period, damage type, prior claims history) and either approves, denies, or flags for manual review.&lt;/li&gt;
&lt;li&gt;Approved claims trigger replacement shipment or repair scheduling through the fulfillment system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude with vision), AgentCore Runtime, Amazon S3 (document/image storage), DynamoDB (warranty database), Step Functions (fulfillment orchestration)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your warranty claims process involves manual image review, policy lookup across multiple systems, and takes 24-48 hours for straightforward claims.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sales and Revenue
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #006 - Lead Qualification and Routing Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engages inbound leads from web forms, chat widgets, or landing pages.&lt;/li&gt;
&lt;li&gt;Asks qualifying questions conversationally (budget range, timeline, company size, use case).&lt;/li&gt;
&lt;li&gt;Enriches lead data by calling company information APIs.&lt;/li&gt;
&lt;li&gt;Scores the lead against ICP criteria and routes qualified leads to the appropriate sales rep based on territory, deal size, and product interest.&lt;/li&gt;
&lt;li&gt;Unqualified leads receive automated nurture sequences.&lt;/li&gt;
&lt;li&gt;All interactions sync back to the CRM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, CRM API (Salesforce/HubSpot), company enrichment API (Clearbit/ZoomInfo)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your SDR team spends more than half their day qualifying leads that turn out to be poor fits, and qualified leads wait hours for first response.&lt;/p&gt;




&lt;h3&gt;
  
  
  #007 - Personalized Product Recommendation Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interacts with customers browsing your product catalog.&lt;/li&gt;
&lt;li&gt;Asks about preferences, use cases, and constraints through natural conversation.&lt;/li&gt;
&lt;li&gt;Queries the product database with semantic search, filters by availability and pricing, and recommends products with specific reasons tied to what the customer described.&lt;/li&gt;
&lt;li&gt;Handles comparison requests across multiple products.&lt;/li&gt;
&lt;li&gt;Tracks which recommendations led to add-to-cart events and feeds that data into downstream analytics or recommendation systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, OpenSearch Serverless (product catalog with vector search), Amazon Personalize, DynamoDB (interaction history)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your product catalog has 500+ SKUs and customers abandon the site because they cannot find what matches their specific needs.&lt;/p&gt;




&lt;h3&gt;
  
  
  #008 - Quote Generation and Pricing Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes customer requirements (quantity, specifications, timeline, delivery location) and generates formal price quotes.&lt;/li&gt;
&lt;li&gt;Pulls current pricing from the ERP system, applies volume discounts, checks contract-specific pricing for existing customers, calculates shipping costs based on logistics APIs, and factors in promotional offers.&lt;/li&gt;
&lt;li&gt;Generates a formatted PDF quote and sends it to the customer.&lt;/li&gt;
&lt;li&gt;Non-standard requests outside pricing rules route to the sales manager with a recommended price and margin analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, ERP API (SAP/Oracle), Lambda (PDF generation), SES (email delivery)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Generating a custom quote takes your sales team 2+ hours and involves pulling data from 3 or more systems manually.&lt;/p&gt;




&lt;h3&gt;
  
  
  #009 - Contract Renewal Intelligence Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; Both&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A + B&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors contract expiration dates across the customer base.&lt;/li&gt;
&lt;li&gt;Sixty days before renewal, it compiles a customer health profile: usage trends, support ticket history, feature adoption, billing history, and NPS scores.&lt;/li&gt;
&lt;li&gt;Generates a renewal risk assessment and recommended pricing strategy (upsell opportunity, standard renewal, at-risk discount needed).&lt;/li&gt;
&lt;li&gt;Sales reps review renewal briefs through Quick dashboards.&lt;/li&gt;
&lt;li&gt;The agent drafts personalized renewal communications based on each account’s specific usage patterns and value received.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Quick (Sight + Research), Redshift, EventBridge (scheduler), SES&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your renewal process starts too late, relies on generic outreach, and your team lacks a consolidated view of customer health at renewal time.&lt;/p&gt;




&lt;h3&gt;
  
  
  #010 - Real-Time Sales Coaching Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Foundation Build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Listens to live sales calls via Amazon Connect integration and Amazon Transcribe streaming.&lt;/li&gt;
&lt;li&gt;Analyzes the conversation in real-time and provides the sales rep with contextual prompts: competitor objection responses, relevant case studies, pricing flexibility guidelines, and technical specifications.&lt;/li&gt;
&lt;li&gt;After the call, generates a summary, identifies follow-up actions, updates the CRM, and scores the call against your sales methodology framework.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Connect, Amazon Transcribe (streaming), Bedrock Knowledge Bases, CRM API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your sales team handles complex technical sales where having the right information during the call directly impacts close rates.&lt;/p&gt;




&lt;h2&gt;
  
  
  Onboarding and Activation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #011 - Customer Onboarding Orchestrator
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Guides new customers through product setup step by step.&lt;/li&gt;
&lt;li&gt;Adapts the onboarding flow based on the customer’s plan tier, industry, and stated goals.&lt;/li&gt;
&lt;li&gt;Configures initial settings, imports data from previous tools via API, creates sample content or workflows, and schedules check-in milestones.&lt;/li&gt;
&lt;li&gt;Tracks completion of onboarding tasks and sends reminders for incomplete steps.&lt;/li&gt;
&lt;li&gt;Escalates to a customer success manager when the customer gets stuck or expresses frustration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory (onboarding state), product API, SES/SNS (notifications)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your time-to-value exceeds 14 days for new customers and onboarding completion rate sits below 70%.&lt;/p&gt;




&lt;h3&gt;
  
  
  #012 - Document Collection and Verification Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages the document collection process for customer applications (financial services, insurance, healthcare enrollment).&lt;/li&gt;
&lt;li&gt;Sends document requests, receives uploads, uses Amazon Textract to extract information, validates extracted data against application requirements, flags discrepancies, and requests corrections or additional documents.&lt;/li&gt;
&lt;li&gt;Maintains a real-time status dashboard showing which documents are complete, pending, or require resubmission.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Amazon Textract, Amazon S3, DynamoDB (application state), SES (communications)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your customer application process requires 5+ documents, average collection time exceeds 2 weeks, and your team spends hours chasing missing or incorrect paperwork.&lt;/p&gt;




&lt;h3&gt;
  
  
  #013 - KYC and Identity Verification Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Foundation Build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conducts Know Your Customer verification for financial services onboarding.&lt;/li&gt;
&lt;li&gt;Collects identity documents, extracts data with Textract, verifies against government databases and sanctions lists via API, performs facial comparison between ID photos and selfies using Amazon Rekognition, runs PEP (Politically Exposed Persons) screening, and generates a risk assessment score.&lt;/li&gt;
&lt;li&gt;Clean cases auto-approve.&lt;/li&gt;
&lt;li&gt;Flagged cases route to compliance analysts with a detailed findings summary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (compliance rules), Amazon Textract, Amazon Rekognition, third-party verification APIs, DynamoDB&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your KYC process takes 3+ business days for standard applications and your compliance team manually reviews documents that could be auto-verified.&lt;/p&gt;




&lt;h3&gt;
  
  
  #014 - Insurance Quoting and Binding Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Walks prospective customers through an insurance application conversationally.&lt;/li&gt;
&lt;li&gt;Collects required information (property details, driving history, health information depending on product line) through natural dialogue instead of static forms.&lt;/li&gt;
&lt;li&gt;Calls underwriting APIs to generate real-time premium quotes.&lt;/li&gt;
&lt;li&gt;Explains coverage options, deductible trade-offs, and exclusions in plain language.&lt;/li&gt;
&lt;li&gt;Standard-risk profiles complete binding and get policy documents.&lt;/li&gt;
&lt;li&gt;Non-standard risks route to an underwriter with the completed application and preliminary risk assessment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Bedrock Guardrails (PII handling), underwriting API, document generation API, payment gateway&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your online quote-to-bind conversion rate is below 15% and customers abandon applications because the process is too long or confusing.&lt;/p&gt;




&lt;h3&gt;
  
  
  #015 - Patient Intake and Pre-Visit Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Foundation Build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contacts patients before scheduled appointments to collect intake information.&lt;/li&gt;
&lt;li&gt;Gathers medical history updates, current medications, symptoms, and insurance details through a conversational interface.&lt;/li&gt;
&lt;li&gt;Verifies insurance eligibility in real-time via payer APIs.&lt;/li&gt;
&lt;li&gt;Pre-populates the EHR with collected information so the provider has context before the visit.&lt;/li&gt;
&lt;li&gt;Sends appointment reminders and preparation instructions (fasting requirements, documents to bring).&lt;/li&gt;
&lt;li&gt;Handles rescheduling requests by checking provider availability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Policy (HIPAA compliance), Bedrock Guardrails (PHI protection), EHR API (Epic/Cerner), insurance verification API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your front desk staff spends 15+ minutes per patient on intake paperwork and your no-show rate exceeds 10%.&lt;/p&gt;




&lt;h2&gt;
  
  
  Self-Service and Account Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #016 - Account Configuration Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles account management requests that currently require support tickets or phone calls.&lt;/li&gt;
&lt;li&gt;Changes billing information, updates contact details, modifies subscription plans, adds or removes users, adjusts notification preferences, and manages API keys.&lt;/li&gt;
&lt;li&gt;Validates changes against account policies before executing.&lt;/li&gt;
&lt;li&gt;Requires additional verification (MFA challenge) for sensitive changes like payment method updates or admin role assignments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, AgentCore Identity (user verification), AgentCore Policy (change authorization), account management API&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; More than 30% of your support tickets are account modification requests that follow standard procedures and require no human judgment.&lt;/p&gt;




&lt;h3&gt;
  
  
  #017 - Billing Dispute Resolution Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Investigates billing disputes by pulling invoice details, payment history, usage records, and contract terms.&lt;/li&gt;
&lt;li&gt;Identifies the root cause: duplicate charge, incorrect rate, usage miscalculation, or failed payment.&lt;/li&gt;
&lt;li&gt;For clear-cut errors, applies the credit automatically and confirms with the customer.&lt;/li&gt;
&lt;li&gt;For ambiguous disputes, presents findings with supporting data and offers resolution options.&lt;/li&gt;
&lt;li&gt;Complex disputes involving contract interpretation route to the billing team with a complete investigation summary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, billing system API, payment processor API, DynamoDB (dispute tracking)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Billing disputes take your team an average of 45+ minutes to investigate because the relevant data lives in 4 or more systems.&lt;/p&gt;




&lt;h3&gt;
  
  
  #018 - Subscription Optimization Advisor
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analyzes a customer’s actual usage patterns against their current subscription tier.&lt;/li&gt;
&lt;li&gt;Identifies underused features the customer is paying for, features they need but lack access to, and usage trends suggesting a plan change would save money or deliver better value.&lt;/li&gt;
&lt;li&gt;Proactively reaches out (or responds when asked) with a specific recommendation backed by the customer’s own data.&lt;/li&gt;
&lt;li&gt;Handles plan changes directly when the customer agrees.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, usage analytics API, billing API, SES (proactive outreach)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Customers churn because they feel they are overpaying, or they hit plan limits and leave instead of upgrading because no one showed them the value of the next tier.&lt;/p&gt;




&lt;h2&gt;
  
  
  Scheduling and Coordination
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #019 - Appointment Scheduling Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Modernization from chatbot&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages appointment booking across multiple providers, locations, and service types.&lt;/li&gt;
&lt;li&gt;Understands natural language requests (”I need to see Dr. Martinez next Tuesday afternoon”), checks real-time availability, considers travel time between locations for the customer, handles rescheduling and cancellations, sends confirmations and reminders.&lt;/li&gt;
&lt;li&gt;Manages waitlists and automatically offers cancellation slots to waiting customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Nova), AgentCore Runtime, scheduling system API, SNS/SES (notifications), DynamoDB (waitlist management)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your scheduling staff handles 100+ calls per day for appointment booking and phone wait times exceed 5 minutes during peak hours.&lt;/p&gt;




&lt;h3&gt;
  
  
  #020 - Service Dispatch and Coordination Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; Migration from RPA&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coordinates field service appointments for installation, repair, or maintenance visits.&lt;/li&gt;
&lt;li&gt;Collects service request details from the customer, determines required skills and equipment, checks technician availability and location, proposes appointment windows, and confirms bookings.&lt;/li&gt;
&lt;li&gt;On the day of service, provides the customer with technician ETA updates.&lt;/li&gt;
&lt;li&gt;If a technician runs late or a job takes longer than expected, automatically reschedules downstream appointments and notifies affected customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, workforce management API, Amazon Location Service, SNS (real-time notifications), EventBridge (event-driven rescheduling)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your dispatch team manually coordinates 50+ service appointments per day and customers complain about missed windows or lack of status updates.&lt;/p&gt;




&lt;h2&gt;
  
  
  Communication and Engagement
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #021 - Personalized Outreach Campaign Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generates personalized outreach messages at scale for marketing campaigns, re-engagement sequences, and lifecycle communications.&lt;/li&gt;
&lt;li&gt;For each customer segment, it pulls behavioral data (purchase history, browsing patterns, feature usage, support interactions), generates message variants tailored to individual contexts, and A/B tests subject lines and content.&lt;/li&gt;
&lt;li&gt;Feeds performance data (open rates, click-throughs, conversions) into analytics workflows that inform future campaign generation.&lt;/li&gt;
&lt;li&gt;Operates within brand guidelines and approved messaging frameworks stored in the knowledge base.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (brand guidelines), Amazon Pinpoint, Redshift (customer data), S3 (campaign assets)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your marketing team sends the same campaign to entire segments and personalization is limited to inserting the customer’s first name.&lt;/p&gt;




&lt;h3&gt;
  
  
  #022 - Review Response and Reputation Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitors customer reviews across platforms (Google, Yelp, Trustpilot, app stores, social media).&lt;/li&gt;
&lt;li&gt;Analyzes sentiment and topic.&lt;/li&gt;
&lt;li&gt;For positive reviews, drafts personalized thank-you responses.&lt;/li&gt;
&lt;li&gt;For negative reviews, investigates the customer’s account to understand context, drafts empathetic responses that address specific complaints, and creates internal tickets for service recovery.&lt;/li&gt;
&lt;li&gt;Aggregates review trends into weekly reports highlighting recurring issues and sentiment shifts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, review platform APIs, CRM API, Amazon Comprehend (sentiment analysis), SNS (alerts for critical reviews)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your team responds to reviews manually, response times exceed 24 hours, and you lack a systematic way to track sentiment trends across platforms.&lt;/p&gt;




&lt;h3&gt;
  
  
  #023 - Multilingual Customer Communication Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Quick Win&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles customer interactions in 20+ languages without requiring multilingual staff.&lt;/li&gt;
&lt;li&gt;Detects the customer’s language from their first message, conducts the entire conversation in that language, and translates internal knowledge base content on the fly.&lt;/li&gt;
&lt;li&gt;Maintains cultural context and idiomatic accuracy beyond literal translation.&lt;/li&gt;
&lt;li&gt;For regulated communications (financial disclosures, healthcare instructions), uses pre-approved translations from the knowledge base instead of real-time generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude, which handles multilingual natively), AgentCore Runtime, Bedrock Knowledge Bases (approved translations), Amazon Translate (fallback for less common languages)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; You serve customers in 3+ languages and currently either hire language-specific support staff or use basic translation tools that miss nuance.&lt;/p&gt;




&lt;h2&gt;
  
  
  Specialized Industry Agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  #024 - Real Estate Property Matching Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; A&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with home buyers through a conversational interface to understand requirements beyond basic filters.&lt;/li&gt;
&lt;li&gt;Captures lifestyle preferences (commute tolerance, school district priorities, neighborhood vibe, proximity to amenities) alongside traditional criteria (bedrooms, budget, location).&lt;/li&gt;
&lt;li&gt;Searches MLS listings with semantic matching, scores properties against the buyer’s full preference profile, and presents curated shortlists with specific reasons each property matches.&lt;/li&gt;
&lt;li&gt;Schedules viewings, provides neighborhood data, and adapts recommendations based on feedback after each showing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime, AgentCore Memory (buyer preference evolution), MLS API, Amazon Location Service, OpenSearch Serverless (semantic property search)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your agents show 15+ properties before a buyer makes an offer and clients say “you’re not understanding what I want” after the third showing.&lt;/p&gt;




&lt;h3&gt;
  
  
  #025 - Travel Itinerary Planning Agent
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; New build&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform:&lt;/strong&gt; AgentCore&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; Strategic Bet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reference Architecture:&lt;/strong&gt; C&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the agent does:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Builds complete travel itineraries through multi-turn conversation.&lt;/li&gt;
&lt;li&gt;A planner agent understands preferences and constraints (dates, budget, interests, mobility needs, dietary restrictions).&lt;/li&gt;
&lt;li&gt;A booking agent searches flights, hotels, and activities through GDS and supplier APIs.&lt;/li&gt;
&lt;li&gt;A logistics agent optimizes the sequence of activities based on geography, operating hours, and travel time.&lt;/li&gt;
&lt;li&gt;The planner presents the consolidated itinerary, handles modifications, and manages booking confirmations.&lt;/li&gt;
&lt;li&gt;Post-booking, it monitors for flight changes, weather disruptions, and sends trip updates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS services:&lt;/strong&gt; Bedrock (Claude), AgentCore Runtime (multi-agent), AgentCore Memory (trip state), GDS/supplier APIs via AgentCore Gateway, Amazon Location Service, EventBridge (monitoring triggers)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need this if:&lt;/strong&gt; Your customers spend 3+ hours planning trips across multiple booking sites and your conversion rate from search to booking sits below 5%.&lt;/p&gt;




&lt;h2&gt;
  
  
  What These 25 Patterns Reveal
&lt;/h2&gt;

&lt;p&gt;A few things stand out across all 25.&lt;/p&gt;

&lt;p&gt;Most customer-facing agents start as chatbot upgrades.&lt;/p&gt;

&lt;p&gt;The jump from “answers questions from a knowledge base” to “takes actions on behalf of the customer” is where real value appears.&lt;/p&gt;

&lt;p&gt;If you already have a chatbot, you have the knowledge base and the channel.&lt;/p&gt;

&lt;p&gt;Adding tool access through AgentCore Gateway converts that chatbot into an agent.&lt;/p&gt;

&lt;p&gt;Quick Wins cluster around support and account management.&lt;/p&gt;

&lt;p&gt;These use cases have well-defined rules, predictable workflows, and clear success metrics.&lt;/p&gt;

&lt;p&gt;They make good first agents because the scope is contained and ROI is measurable within weeks.&lt;/p&gt;

&lt;p&gt;Multi-agent architectures show up only when necessary.&lt;/p&gt;

&lt;p&gt;Only a couple of the 25 patterns require multiple coordinated agents.&lt;/p&gt;

&lt;p&gt;Most customer-facing work is handled well by a single agent with the right tools.&lt;/p&gt;

&lt;p&gt;Build multi-agent systems because a single agent’s context window or tool set has become unreliable, not because the architecture sounds impressive.&lt;/p&gt;

&lt;p&gt;The foundation model matters less than the integration layer.&lt;/p&gt;

&lt;p&gt;Swapping Claude for Nova changes your cost profile but rarely changes the architecture.&lt;/p&gt;

&lt;p&gt;The APIs, knowledge bases, and policy rules are where the real engineering happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;This series continues with four more editions, same format, different domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Edition 2&lt;/strong&gt; - Internal knowledge and productivity agents (employee-facing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edition 3&lt;/strong&gt; - Workflow automation and process agents (internal operations, no direct customer interaction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edition 4&lt;/strong&gt; - Data and analytics agents (self-service BI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edition 5&lt;/strong&gt; - Compliance, security, and governance agents (high-stakes environments)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bookmark this. When someone on your team says “we should use AI for X,” pull up the relevant card and walk into the architecture discussion with a starting point instead of a blank whiteboard.&lt;/p&gt;

&lt;p&gt;I publish every week at &lt;a href="https://buildwithaws.substack.com" rel="noopener noreferrer"&gt;buildwithaws.substack.com&lt;/a&gt;. Subscribe. It's free.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>genai</category>
      <category>aws</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Anthropic accidentally leaked Claude Code's source code. Here's what that means.</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Tue, 31 Mar 2026 21:51:47 +0000</pubDate>
      <link>https://dev.to/aws-builders/anthropic-accidentally-leaked-claude-codes-source-code-heres-what-that-means-2f89</link>
      <guid>https://dev.to/aws-builders/anthropic-accidentally-leaked-claude-codes-source-code-heres-what-that-means-2f89</guid>
      <description>&lt;p&gt;Last week, someone noticed that version 2.1.88 of the Claude Code npm package was 60MB heavier than it should have been.&lt;br&gt;
Inside: reconstructable source code for Claude Code's CLI. Around 512,000 lines of TypeScript across nearly 2,000 files. Significant portions of the agent codebase that Anthropic had kept private, exposed by a single build mistake.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does a mistake like this even happen?
&lt;/h2&gt;

&lt;p&gt;When developers ship software, they often minify the code first. That means compressing it into an unreadable blob of abbreviated variable names and stripped formatting. The goal is smaller files, faster downloads, and some protection from competitors reading your work.&lt;br&gt;
To debug that minified code, teams use source maps: files that translate the ugly compressed version back into the original readable code. &lt;br&gt;
These are internal tools. &lt;br&gt;
They should never ship to users.&lt;br&gt;
This one did.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was actually inside?
&lt;/h2&gt;

&lt;p&gt;Reported findings include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How Claude Code's agent loop works&lt;/li&gt;
&lt;li&gt;Multi-agent coordination logic&lt;/li&gt;
&lt;li&gt;Around 44 feature flags for unshipped functionality&lt;/li&gt;
&lt;li&gt;System prompts Claude Code uses internally&lt;/li&gt;
&lt;li&gt;How persistent memory is implemented&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What was confirmed not inside: model weights, training data, backend infrastructure, or safety pipelines. &lt;br&gt;
The AI is fine. &lt;br&gt;
This was the client-side scaffolding around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wasn't Claude Code already open source?
&lt;/h2&gt;

&lt;p&gt;Anthropic has a public GitHub repo for Claude Code and a Claude Agent SDK that developers can use to build their own tools. So there's always been some public surface area.&lt;br&gt;
But the actual application has always shipped as an obfuscated bundle. &lt;br&gt;
You could install it and run it. &lt;br&gt;
You could not read how it worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what should you actually pay attention to?
&lt;/h2&gt;

&lt;p&gt;The feature flags are the most interesting part. Hidden functionality sitting behind conditionals tells you a lot about what Anthropic is building next. People are already mapping those out.&lt;br&gt;
Anthropic confirmed this was human error, not a security breach, and no customer data was exposed. If you're building on Claude Code or evaluating agentic AI tools, this is a rare look at how a production-grade AI agent is actually architected. The code is already mirrored across GitHub. &lt;br&gt;
It's not going anywhere.&lt;/p&gt;

</description>
      <category>anthropic</category>
      <category>agents</category>
      <category>security</category>
    </item>
    <item>
      <title>A Serverless Blueprint for Multimodal Video Search on AWS</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Thu, 26 Mar 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/a-serverless-blueprint-for-multimodal-video-search-on-aws-4mdn</link>
      <guid>https://dev.to/aws-builders/a-serverless-blueprint-for-multimodal-video-search-on-aws-4mdn</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://buildwithaws.substack.com/p/designing-a-multimodal-video-search" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This design was inspired by Miguel Otero Pedrido and Alex Razvant’s &lt;a href="https://theneuralmaze.substack.com/p/your-first-video-agent-multimodality" rel="noopener noreferrer"&gt;“Kubrick”&lt;/a&gt; course, but rebuilt using native AWS primitives instead of custom frameworks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc319t0z3yaug2ekb27c4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc319t0z3yaug2ekb27c4.png" alt=" " width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Video is impossible to search.&lt;/p&gt;

&lt;p&gt;You can scrub through it manually, or rely on YouTube’s auto-generated captions that only match exact keywords.&lt;/p&gt;

&lt;p&gt;But what if you want to find “the outdoor mountain scene” or “where they discuss AI ethics”?&lt;/p&gt;

&lt;p&gt;Traditional video platforms fail here because they treat video as a single data type.&lt;/p&gt;

&lt;p&gt;This system treats video as three parallel search problems.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Speech gets transcribed with word-level timestamps and indexed for semantic search.&lt;/li&gt;
&lt;li&gt; Every frame generates a semantic description through Claude Vision and goes into a separate index.&lt;/li&gt;
&lt;li&gt; Those same frames become 1,024-dimensional vectors for visual similarity search.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Users ask questions in natural language, and an intelligent agent figures out which index to query. Results come back with exact timestamps.&lt;/p&gt;

&lt;p&gt;The architecture runs entirely on serverless AWS: AgentCore Gateway for tool orchestration, Bedrock Knowledge Bases for RAG, S3 Vectors for image search, and Lambda tying everything together.&lt;/p&gt;

&lt;p&gt;Processing cost is front-loaded (heavy on first upload), but once videos are indexed, the system runs for roughly $3 per month per 100 videos. Query latency stays under 2 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three-Index Architecture
&lt;/h2&gt;

&lt;p&gt;Most video systems treat search as a single problem: match keywords in titles or auto-generated captions. That works if users know exactly what they’re looking for and can describe it with the exact words spoken in the video.&lt;/p&gt;

&lt;p&gt;It breaks down when someone asks “show me outdoor mountain scenes” or wants to find visually similar shots.&lt;/p&gt;

&lt;p&gt;The solution is to treat video as three separate, parallel search problems.&lt;/p&gt;

&lt;p&gt;-First, transcribe the audio track completely and index every spoken word with word-level timestamps.&lt;/p&gt;

&lt;p&gt;This handles “what was said” queries.&lt;/p&gt;

&lt;p&gt;-Second, extract frames throughout the video, generate semantic descriptions using Claude Vision, and index those descriptions.&lt;/p&gt;

&lt;p&gt;This handles “what was shown” queries.&lt;/p&gt;

&lt;p&gt;-Third, create vector embeddings of those same frames using Titan Multimodal and store them in S3 Vectors for visual similarity search.&lt;/p&gt;

&lt;p&gt;Each index serves a different user intent.&lt;/p&gt;

&lt;p&gt;The speech index answers “find where they discuss machine learning.”&lt;/p&gt;

&lt;p&gt;The caption index answers “show me celebration scenes.”&lt;/p&gt;

&lt;p&gt;The image index answers “find shots that look like this” when users upload a reference image.&lt;/p&gt;

&lt;p&gt;Users don’t need to know which index exists. An intelligent agent analyzes their query, determines which tool to invoke, executes the search, and returns results with exact timestamps.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuo8af8ow4v13mg57do0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuo8af8ow4v13mg57do0g.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;br&gt;
The frontend is a single-page app hosted on S3 and delivered via CloudFront. Users upload videos through a presigned URL directly to S3, which triggers the processing pipeline. Searches go through API Gateway to the agent Lambda, which either invokes tools directly (Manual Mode) or asks Claude Sonnet to analyze intent and select the right tool (Auto Mode). Tools are exposed via AgentCore Gateway using the Model Context Protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Processing Pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21s15cdj5xlb20nt1ifz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21s15cdj5xlb20nt1ifz.png" alt=" " width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a user uploads a video, the orchestrator Lambda kicks off two parallel tracks: frame extraction and transcription.&lt;/p&gt;

&lt;p&gt;The frame track extracts frames using FFmpeg, sends them to Claude Vision for semantic descriptions, and creates vector embeddings for similarity search.&lt;/p&gt;

&lt;p&gt;The transcription track uses AWS Transcribe to generate word-level timestamps, then chunks and indexes the transcript for semantic search.&lt;/p&gt;

&lt;p&gt;Both complete in roughly 5-6 minutes for a 2-minute video.&lt;/p&gt;

&lt;p&gt;Frame extraction doesn’t use a fixed frame rate like 6fps or 1fps. Instead, it extracts a fixed number of frames evenly distributed across the video duration. A 30-second clip gets 45-120 frames. A 10-minute video also gets 45-120 frames. This matters because caption generation costs scale with frame count, not video length.&lt;/p&gt;

&lt;p&gt;Timestamps are calculated using (frame_number - 1) × duration / (total_frames - 1) to ensure frames are spread evenly from start to finish, with the first frame at 0 seconds and the last frame at the video’s end.&lt;/p&gt;

&lt;p&gt;FFmpeg runs inside a Lambda function with 2GB of memory and a 10-minute timeout. For videos longer than 10 minutes, the system would need Fargate or Step Functions to handle the extended processing time. But the processing logic stays the same, just a different execution environment.&lt;/p&gt;

&lt;p&gt;Transcription happens in parallel via AWS Transcribe. The service processes the audio track asynchronously and typically finishes in about 1/4 of the video duration. A 10-minute video transcribes in roughly 2.5 minutes. A polling Lambda checks the job status with a 5-second delay between attempts (up to 60 attempts max, allowing roughly 5 minutes of polling).&lt;/p&gt;

&lt;p&gt;Transcribe returns word-level timestamps in JSON format: each word gets a start time, end time, and confidence score. Punctuation appears as separate items without timing. This granularity is critical because when Bedrock Knowledge Base returns a text snippet later, we need to map that snippet back to exact timestamps in the original video.&lt;/p&gt;

&lt;p&gt;The chunk_transcript Lambda processes the Transcribe output into 10-second audio chunks, each preserving the original word-level timestamps. Each chunk becomes a separate JSON file (chunk_0001.json, chunk_0002.json, etc.) containing the chunk text, precise start_time_sec and end_time_sec boundaries, and metadata.&lt;/p&gt;

&lt;p&gt;This pre-chunking ensures that search results can be mapped back to exact video positions while maintaining semantic coherence within each searchable segment.&lt;/p&gt;

&lt;p&gt;Documents are stored at {video_id}/speech_index/ and {video_id}/caption_index/ within the processed bucket. Caption data follows a similar pattern, with one JSON file per frame containing the Claude Vision-generated description, frame number, and timestamp.&lt;/p&gt;

&lt;p&gt;Bedrock Knowledge Base has a limitation: it doesn’t support wildcards in S3 inclusion prefixes. You cannot configure it to scan */speech_index/ across multiple video folders. The deployed Bedrock Knowledge Bases are configured to work with the current bucket structure. The chunk_transcript and embed_captions Lambdas trigger KB ingestion jobs after uploading new documents, ensuring search indexes stay synchronized with processed content. Bedrock KB generates embeddings for each document, enabling semantic search while preserving the timestamp metadata attached to each chunk.&lt;/p&gt;

&lt;p&gt;The current implementation prioritizes organizing all video-related data under a single video_id prefix for easier management and deletion. An alternative architecture would place the index type at the top level (speech_index/{video_id}/...) allowing a single KB inclusion prefix to scan all videos, but would sacrifice per-video organizational simplicity.&lt;/p&gt;

&lt;p&gt;Caption generation is where processing costs concentrate. Each frame goes to Claude 3.5 Sonnet via Bedrock with a prompt that asks for 2-3 sentence descriptions focusing on subjects, actions, setting, and atmosphere. Claude returns natural language like “A chef in a white uniform demonstrates knife skills in a modern kitchen, dicing vegetables while explaining technique to a camera.” Each caption saves as a JSON file with the description, frame metadata, and timestamp.&lt;/p&gt;

&lt;p&gt;At roughly $0.005-0.008 per frame, a video with 100 frames costs $0.50-0.80 to caption. That’s 5-8x more expensive than Amazon Rekognition, which would return structured labels like “Person” (93% confidence), “Kitchen” (89%), “Knife” (85%). The cost premium buys search quality. When users ask “show me cooking demonstrations,” Claude’s semantic descriptions match the intent. Rekognition’s labels don’t connect to natural language queries the same way. For a system built around conversational search, Claude’s cost is justified.&lt;/p&gt;

&lt;p&gt;The same frames that get captions also become vector embeddings. Titan Multimodal Embeddings generates 1,024-dimensional vectors at $0.00006 per frame, essentially free compared to caption costs. These vectors go into S3 Vectors, a serverless vector store that handles indexing and similarity search without infrastructure management. Each vector record includes the embedding wrapped in a float32 format plus metadata for video ID, frame number, and timestamp. This enables “find similar shots” queries where users upload a reference image and get back visually similar frames.&lt;/p&gt;

&lt;h2&gt;
  
  
  Search and Retrieval
&lt;/h2&gt;

&lt;p&gt;The three indexes sit behind Bedrock Knowledge Bases (for speech and captions) and S3 Vectors (for images). AgentCore Gateway exposes six tools via the Model Context Protocol: search_by_speech, search_by_caption, search_by_image, list_videos, get_video_metadata, and get_full_transcript. The agent Lambda invokes these tools either directly when users pick Manual Mode, or through Claude’s analysis in Auto Mode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fden6kfnevbbogazsnraf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fden6kfnevbbogazsnraf.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;br&gt;
The Knowledge Base has a timestamp problem. When it returns a text snippet from a transcript, it doesn’t include the original timestamps from the Transcribe JSON.&lt;/p&gt;

&lt;p&gt;The snippet is just text. But users need “go to 2:34 in the video,” not “this text appears somewhere in there.”&lt;/p&gt;

&lt;p&gt;The solution is having Claude match the snippet back to the word-level timeline. The agent downloads the Transcribe JSON, extracts all words with their start and end times, and asks Claude to find which words semantically match the returned snippet. Claude returns {”start_time”: 154.2, “end_time”: 157.8}. This adds about 500ms to query latency, but the precision is worth it.&lt;/p&gt;

&lt;p&gt;The Knowledge Base might paraphrase “we’re exploring artificial intelligence” while the original transcript says “we are exploring AI,” and Claude maps them correctly anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Intelligent Routing
&lt;/h2&gt;

&lt;p&gt;The agent Lambda receives user queries and decides which tool to invoke. In Manual Mode, users explicitly pick speech, caption, or image search, and the agent calls that tool directly. In Auto Mode, users just type natural language, and Claude Sonnet 4 figures out the intent.&lt;/p&gt;

&lt;p&gt;A query like “find where they discuss machine learning” goes to speech search. “Show me outdoor mountain scenes” goes to caption search. “Find similar shots” triggers image search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3vl1thxl7anpiycuznd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3vl1thxl7anpiycuznd.png" alt=" " width="800" height="423"&gt;&lt;/a&gt;&lt;br&gt;
Claude gets a system prompt explaining what each tool does: search_by_speech queries transcripts, search_by_caption queries frame descriptions, search_by_image handles visual similarity. Claude analyzes the user’s question and returns structured JSON with the tool name, parameters, and reasoning. The agent then invokes that tool via AgentCore Gateway using SigV4-signed requests. Results come back with video IDs, timestamps, matched text, and confidence scores, all formatted for the frontend to display.&lt;/p&gt;

&lt;p&gt;This design skips Bedrock Agents entirely. Bedrock Agents handle orchestration automatically, but that comes with limited control over error handling, no support for custom timestamp extraction logic, and extra cost for features this system doesn’t need. Building the agent from scratch using Claude’s tool-use API gives full control over the routing logic, parallel tool execution, and response formatting.&lt;/p&gt;

&lt;p&gt;AgentCore Gateway sits between the agent and the tools, hosting an MCP (Model Context Protocol) server that exposes the six search and utility tools. Each tool backs to a Lambda function, and the Gateway handles SigV4 authentication, tool discovery, and request routing. When the agent invokes search_by_speech, the Gateway routes that to the speech search Lambda, waits for results, and returns them. Adding new tools means registering them in the Gateway configuration. No agent code changes required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Trade-Offs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foijws68copqivhjux0ah.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foijws68copqivhjux0ah.png" alt=" " width="800" height="420"&gt;&lt;/a&gt;&lt;br&gt;
The three-index architecture trades infrastructure complexity for search quality.&lt;/p&gt;

&lt;p&gt;A single Knowledge Base containing transcripts, captions, and image data would be simpler to manage. But speech needs dense text with context windows. Captions need short, precise matching. Images need vector similarity, not text search. Separate indexes let each modality optimize independently, and the search quality difference is measurable. Users asking “show me outdoor scenes” get relevant results from the caption index that a combined index would miss.&lt;/p&gt;

&lt;p&gt;Claude Vision costs 5-8x more than Rekognition per frame. For 100 frames, that’s $0.50-0.80 versus $0.10. The cost premium comes from Claude generating full semantic descriptions while Rekognition returns structured labels with confidence scores. When users search with natural language like “cooking demonstrations,” Claude’s narrative captions match their intent. Rekognition’s labels (”Person”, “Kitchen”, “Utensil”) don’t connect to conversational queries the same way.&lt;/p&gt;

&lt;p&gt;The system prioritizes search experience over processing cost because users abandon systems that don’t find what they’re looking for.&lt;/p&gt;

&lt;p&gt;S3 Vectors handles vector storage without managing clusters or configuring indexes. Query latency runs 200-300ms, which is acceptable for this use case.&lt;/p&gt;

&lt;p&gt;OpenSearch Serverless would deliver sub-100ms queries and support hybrid keyword+vector search, but it adds complexity and cost that the system doesn’t need yet. The switch point is around 10k videos or when query latency becomes the primary user complaint. Below that threshold, S3 Vectors is simpler and cheaper.&lt;/p&gt;

&lt;p&gt;Lambda handles all processing because video workflows are bursty. A system might process 10 videos in an hour, then sit idle for three hours. Fargate would cost roughly $30 per month per service even when doing nothing. Lambda costs $0 when idle.&lt;/p&gt;

&lt;p&gt;The breaking point is continuous processing at 100+ videos per hour, where Fargate’s flat rate becomes cheaper than Lambda’s per-execution pricing. Most video systems never hit that threshold.&lt;/p&gt;

&lt;p&gt;Frame extraction uses a fixed frame count (45-120 frames) evenly distributed across video duration rather than a fixed frame rate. This decision controls caption costs: 100 frames to caption regardless of whether the video runs 30 seconds or 10 minutes. A 6fps approach would generate 1800 frames for a 5-minute video and 600 frames for a 100-second video, wildly different costs. Fixed frame count makes processing costs predictable and avoids redundant captions when adjacent frames look nearly identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;Processing 100 two-minute videos costs roughly $58 up front, then $3 per month to keep running. With a fixed frame count of 80 frames per video (middle of the 45-120 range), the math is straightforward: 100 videos × 80 frames = 8,000 frames total. Claude Vision at $0.006 per frame comes to $48. AWS Transcribe adds $4.80 for speech transcription (200 minutes at $0.024 per minute). Titan image embeddings cost $0.48 for those same 8,000 frames. Lambda invocations are negligible at $0.10.&lt;/p&gt;

&lt;p&gt;Storage runs about $0.40 per month. Frames take up roughly 5-8GB (80 frames × 50-100KB average per JPEG × 100 videos) at $0.12-0.18. S3 Vectors hold 120-150MB of embeddings (8,000 vectors × 15-20KB each including metadata) for $0.003-0.004. Transcripts take about 20MB at $0.0005. Bedrock Knowledge Base vectors store in S3, already counted in the frame storage cost. The dominant cost is always frame storage.&lt;/p&gt;

&lt;p&gt;Queries cost $0.27 per thousand. Bedrock Knowledge Base retrieval is $0.10, Claude Sonnet 4 for routing is $0.15, and S3 Vectors queries are $0.02. API Gateway and Lambda execution costs are minimal enough to ignore at this scale. A system running 10,000 queries per month pays $2.70 in query costs.&lt;/p&gt;

&lt;p&gt;The cost structure is front-loaded. Month 1 with 100 new videos: $61 (processing + storage + queries). Month 2 with no new uploads: $3.10 (storage + queries). Month 3: $3.10. The system essentially costs $3 per month to operate once videos are processed, with spikes when new content arrives.&lt;/p&gt;

&lt;p&gt;Frame count directly controls caption costs. Using 120 frames per video instead of 80 increases caption costs from $48 to $72 per 100 videos. Using 45 frames drops it to $27. Bedrock Batch Inference offers a 50% discount on Claude pricing but delays results by 24 hours, acceptable for async workflows. Combining lower frame counts (45-60) with batch inference brings processing costs down to $15-20 per 100 videos.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and Scaling
&lt;/h2&gt;

&lt;p&gt;A 2-minute video takes 5-6 minutes to become fully searchable. Frame extraction completes in 10-15 seconds. Transcription runs asynchronously and finishes in about 30 seconds. Caption generation is the bottleneck at 3-5 minutes, processing 100 frames at 2-3 seconds each. Image embedding adds another 20-30 seconds. Batch inference trades processing speed for cost savings: results take 24 hours instead of 5 minutes, but cut costs in half.&lt;/p&gt;

&lt;p&gt;Query latency stays under 2 seconds for speech and caption search. Speech queries run 800-1200ms: Bedrock Knowledge Base retrieves matching snippets in 400-600ms, then Claude extracts precise timestamps from the Transcribe JSON in another 400-500ms.&lt;/p&gt;

&lt;p&gt;Caption queries run faster at 600-900ms since frame timestamps come directly from metadata. Image similarity search is fastest at 300-500ms, just a vector query against S3 Vectors. The agent routing overhead (Claude analyzing intent and selecting tools) adds 400-600ms in Auto Mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Considerations
&lt;/h2&gt;

&lt;p&gt;When Lambda crashes during caption generation, AWS automatically retries async invocations twice by default. The generate_captions Lambda catches individual frame failures and continues processing remaining frames rather than halting the entire batch.&lt;/p&gt;

&lt;p&gt;The process_video and extract_frames Lambdas update DynamoDB status to ‘error’ on failure, but caption generation failures are logged to CloudWatch without explicit DynamoDB status tracking.&lt;/p&gt;

&lt;p&gt;Partial results persist in S3 - frames remain available even if caption generation crashes afterward. There’s no automatic recovery mechanism, so resuming a failed step requires manually re-invoking the specific Lambda function with the video_id parameter, which reprocesses that entire step rather than resuming from the failure point.&lt;/p&gt;

&lt;p&gt;Search query failures depend on a cascade of timeouts. The agent_api Lambda has a 60-second timeout, though internal Bedrock requests use a 30-second timeout. API Gateway enforces a 29-second maximum integration timeout, which would typically trigger first and return a timeout error to the user.&lt;/p&gt;

&lt;p&gt;Query performance depends heavily on result count and metadata filtering - requesting 50 results from a large Knowledge Base performs worse than requesting 5 with specific video_id filters.&lt;/p&gt;

&lt;p&gt;Knowledge Base synchronization happens programmatically, not on a schedule. After uploading transcripts or captions to S3, the chunk_transcript and embed_captions Lambdas explicitly trigger ingestion jobs via bedrock_agent.start_ingestion_job(). This ensures new content becomes searchable without waiting for automatic syncs.&lt;/p&gt;

&lt;p&gt;The code logs indicate ingestion typically completes in around 2 minutes, though actual time varies with document count and KB size.&lt;/p&gt;

&lt;p&gt;The architecture scales from 100 to 1,000 videos without structural changes. Storage costs scale linearly with video count - 10x the videos means 10x the S3 storage costs. Query latency depends more on index size and query complexity than sheer video count, since Bedrock KB and S3 Vectors both use vector indexes that grow with content volume. Lambda concurrency rarely becomes an issue because video processing happens asynchronously over time rather than simultaneously.&lt;/p&gt;

&lt;p&gt;At 10,000+ videos, you’d monitor specific bottlenecks as they emerge.&lt;/p&gt;

&lt;p&gt;Bedrock Knowledge Base query latency could increase as vector indexes grow larger. S3 Vectors performance might degrade with hundreds of thousands or millions of frame vectors.&lt;/p&gt;

&lt;p&gt;The list_videos DynamoDB scan would slow down, requiring pagination and potentially a Global Secondary Index on upload_timestamp for efficient retrieval.&lt;/p&gt;

&lt;p&gt;These are optimization problems, not architectural redesigns - the core processing logic stays the same while execution environments might shift from Lambda to Fargate for longer videos, or from S3 Vectors to OpenSearch Serverless for consistently sub-100ms vector queries at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment and Production Readiness
&lt;/h2&gt;

&lt;p&gt;The infrastructure deploys through AWS CDK with a single command: cdk deploy --all. This creates two stacks - InfrastructureStack with 19 Lambda functions, 2 Bedrock Knowledge Bases, 4 S3 buckets, a DynamoDB table, and API Gateway, plus FrontendStack with CloudFront distribution and frontend bucket. The Bedrock Knowledge Bases and AgentCore Gateway are pre-configured in AWS rather than created by the CDK deployment. The entire stack is version-controlled and reproducible across environments.&lt;/p&gt;

&lt;p&gt;All Lambda functions log to CloudWatch with 731 days (2 years) of retention. The deployment includes no CloudWatch alarms, SNS topics, or automated monitoring by default - production deployments would need to add metric filters for processing duration, query latency, and failure rates. The CloudWatch logs capture every Lambda invocation but require manual querying or external tooling for insights beyond basic log inspection.&lt;/p&gt;

&lt;p&gt;Native AWS services handle complex multimodal AI workloads without custom frameworks or infrastructure. AgentCore Gateway provides MCP standardization for tool orchestration. Bedrock Knowledge Bases manage retrieval-augmented generation across speech and caption indexes. S3 Vectors store image embeddings. Lambda processes videos and routes queries. The system runs at a low monthly cost after initial video processing, with predictable scaling characteristics up to 10,000 videos.&lt;/p&gt;

&lt;p&gt;The three-index architecture is a practical solution to a real problem. Users can’t find specific moments in video content using traditional keyword search. This system lets them ask natural language questions and get back exact timestamps, whether they’re searching for spoken content, visual scenes, or similar-looking shots.&lt;/p&gt;

&lt;p&gt;The design prioritizes search quality over processing cost because users abandon systems that don’t find what they’re looking for.&lt;/p&gt;

&lt;p&gt;The architecture scales from prototype to production without rewrites.&lt;/p&gt;

&lt;p&gt;Start with 100 videos on Lambda and S3 Vectors. Grow to 1,000 videos without changes. Push to 10,000 videos with monitoring and metadata filters.&lt;/p&gt;

&lt;p&gt;Beyond that, swap Lambda for Fargate, S3 Vectors for OpenSearch, and add ElastiCache. The core logic stays the same.&lt;/p&gt;

&lt;p&gt;What’s next?&lt;/p&gt;

&lt;p&gt;Challenge the Blueprint: Share your advanced use case or propose an upgrade in the comments.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You can find a detailed account of how each part is built, the criteria for the options chosen, and other details in the project’s &lt;a href="https://github.com/marceloacosta/smart_video_search_system" rel="noopener noreferrer"&gt;repo&lt;/a&gt;. Feel free to contribute or open any issues you find.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;I publish every week at &lt;a href="https://buildwithaws.substack.com" rel="noopener noreferrer"&gt;buildwithaws.substack.com&lt;/a&gt;. Subscribe. It's free.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>serverless</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Agent Memory Strategies: Building Believable AI with Bedrock AgentCore</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Wed, 25 Mar 2026 10:42:30 +0000</pubDate>
      <link>https://dev.to/aws-builders/agent-memory-strategies-building-believable-ai-with-bedrock-agentcore-kn6</link>
      <guid>https://dev.to/aws-builders/agent-memory-strategies-building-believable-ai-with-bedrock-agentcore-kn6</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://buildwithaws.substack.com/" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Your agent answers a question about project deadlines by retrieving every meeting from the past six months.&lt;/p&gt;

&lt;p&gt;The response is technically accurate but completely useless, burying the critical deadline mentioned yesterday beneath dozens of irrelevant status updates from March.&lt;/p&gt;

&lt;p&gt;You see this in a lot of agents unless you design retrieval on purpose.&lt;/p&gt;

&lt;p&gt;The agent remembered everything but understood nothing about what actually mattered in that moment.&lt;/p&gt;

&lt;p&gt;The Stanford research team that created &lt;a href="https://dl.acm.org/doi/10.1145/3586183.3606763" rel="noopener noreferrer"&gt;“Generative Agents”&lt;/a&gt; encountered this exact problem while building 25 simulated characters for a virtual town environment.&lt;/p&gt;

&lt;p&gt;Their agents could store thousands of observations, but when asked what to do next, they retrieved memories randomly based on simple keyword matching.&lt;/p&gt;

&lt;p&gt;This produced bizarre behavior loops where agents repeated the same action multiple times in a row because their memory system couldn’t distinguish “I just did this five minutes ago” from “I generally do this around lunchtime.”&lt;/p&gt;

&lt;p&gt;Smarter memory retrieval based on three scoring dimensions solved this problem: recency (when did this happen), importance (how much did this matter), and relevance (does this relate to my current situation).&lt;/p&gt;

&lt;p&gt;Amazon Bedrock AgentCore now provides the infrastructure to implement these memory strategies at enterprise scale.&lt;/p&gt;

&lt;p&gt;But understanding why these mechanisms matter and how to configure them effectively requires examining the research that proved their necessity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Memory Retrieval Problem: Why Raw Storage Fails
&lt;/h2&gt;

&lt;p&gt;Language models can process vast context windows, but that capability creates a dangerous illusion.&lt;/p&gt;

&lt;p&gt;Organizations assume that giving agents access to complete conversation history and knowledge bases will produce intelligent behavior. In practice, it doesn’t work that way.&lt;/p&gt;

&lt;p&gt;Consider an agent helping with customer support.&lt;/p&gt;

&lt;p&gt;The customer mentions a billing issue from three months ago, asks about a current feature request, and wants to schedule a call.&lt;/p&gt;

&lt;p&gt;The agent’s memory contains thousands of interactions with this customer across multiple categories: billing problems, feature requests, scheduling conflicts, casual chitchat about industry events.&lt;/p&gt;

&lt;p&gt;Without retrieval scoring, the agent treats all memories as equally relevant.&lt;/p&gt;

&lt;p&gt;The context window fills with whatever was stored most recently or whatever matches basic keyword searches.&lt;/p&gt;

&lt;p&gt;The agent might retrieve detailed notes about the customer’s preferences for coffee (mentioned casually last week) while missing the critical billing escalation pattern that requires immediate attention.&lt;/p&gt;

&lt;p&gt;The Stanford Generative Agents research demonstrated this failure mode systematically.&lt;/p&gt;

&lt;p&gt;When Klaus Mueller, one of their simulated characters, was asked to recommend someone to spend time with, the version without proper memory retrieval chose Wolfgang simply because Wolfgang’s name appeared frequently in recent observations. The character had never had a meaningful conversation with Wolfgang.&lt;/p&gt;

&lt;p&gt;They just lived in the same dorm and passed each other constantly.&lt;/p&gt;

&lt;p&gt;With memory retrieval scoring, Klaus chose Maria Lopez, someone he’d actually collaborated with on research projects.&lt;/p&gt;

&lt;p&gt;The memories of those substantive interactions scored higher across multiple dimensions despite being less frequent than the Wolfgang encounters.&lt;/p&gt;

&lt;p&gt;This distinction matters enormously for enterprise agents. The difference between retrieving memories based on recency alone versus scoring across multiple dimensions determines whether agents exhibit genuine understanding or just pattern match on whatever happened most recently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recency Scoring: Time-Aware Memory Access
&lt;/h2&gt;

&lt;p&gt;Recency scoring implements a simple but crucial insight: recent experiences should influence behavior more than distant ones, but the decay shouldn’t be linear.&lt;/p&gt;

&lt;p&gt;An interaction from 10 minutes ago remains highly relevant. An interaction from 10 months ago might still matter for specific contexts but shouldn’t dominate general decision-making.&lt;/p&gt;

&lt;p&gt;The Stanford team implemented recency through exponential decay functions.&lt;/p&gt;

&lt;p&gt;Each memory receives a recency score that decreases over time at a rate determined by the decay factor.&lt;/p&gt;

&lt;p&gt;In their implementation, they used a decay factor of 0.995 per time unit (their simulation used hourly intervals), creating a smooth gradient where very recent memories score highest but older memories remain accessible when other factors (importance, relevance) elevate them.&lt;/p&gt;

&lt;p&gt;This approach elegantly solves the “everything is equally important” problem without requiring manual categorization.&lt;/p&gt;

&lt;p&gt;When an agent plans an event, memories of yesterday’s specific preparations score significantly higher than memories of general operations from last week.&lt;/p&gt;

&lt;p&gt;Both memories exist, but recency scoring ensures the contextually appropriate one influences current planning.&lt;/p&gt;

&lt;p&gt;For enterprise agents, recency scoring prevents a common failure mode: over-reliance on initial training or setup information that’s no longer current.&lt;/p&gt;

&lt;p&gt;A customer service agent needs to prioritize the customer’s statement from 30 seconds ago over background information from the knowledge base, unless other factors indicate the background information carries unusual importance.&lt;/p&gt;

&lt;p&gt;Implementation requires three technical decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, selecting the decay function shape.&lt;/p&gt;

&lt;p&gt;Exponential decay works well for most agent applications because it creates gentle transitions rather than harsh cutoffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, choosing the decay rate.&lt;/p&gt;

&lt;p&gt;Faster decay means stronger recency bias, slower decay preserves long-term context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt;, defining time units relevant to your agent’s operation.&lt;/p&gt;

&lt;p&gt;Hours work for customer service, days for project management, seconds for real-time monitoring.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock AgentCore handles recency implicitly through its extraction and consolidation strategies rather than exposing explicit decay functions.&lt;br&gt;
New information is incorporated into long-term memory through consolidation, while older or superseded information becomes less likely to surface during retrieval.&lt;/p&gt;

&lt;p&gt;This behavior creates the appearance of recency, but AgentCore does not model time as a scoring factor. Recent information dominates only because it remains in the active session, not because it is weighted higher during retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Importance Scoring: Distinguishing Mundane from Critical
&lt;/h2&gt;

&lt;p&gt;Not all experiences carry equal significance.&lt;/p&gt;

&lt;p&gt;An agent that treats “scheduled regular status meeting” and “critical security incident reported” as equivalent memories will make catastrophic decisions.&lt;/p&gt;

&lt;p&gt;Importance scoring solves this by assigning weights that reflect the significance of each experience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The Stanford research revealed an elegant solution to importance assessment: simply ask the language model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Rather than building complex heuristic systems, they prompted the model with a straightforward question: “On a scale of 1 to 10, where 1 is purely mundane (e.g., brushing teeth, making bed) and 10 is extremely poignant (e.g., a break up, college acceptance), rate the likely poignancy of the following piece of memory.”&lt;/p&gt;

&lt;p&gt;This approach works remarkably well because language models have learned implicit importance hierarchies from their training data.&lt;/p&gt;

&lt;p&gt;“Cleaning up the room” consistently scores around 2.&lt;/p&gt;

&lt;p&gt;“Asking your crush out on a date” scores around 8.&lt;/p&gt;

&lt;p&gt;The model doesn’t need explicit rules about importance. It already understands the relative significance of human experiences.&lt;/p&gt;

&lt;p&gt;For enterprise agents, importance scoring prevents memory streams from becoming cluttered with routine operational noise.&lt;/p&gt;

&lt;p&gt;Consider an agent monitoring infrastructure health.&lt;/p&gt;

&lt;p&gt;The system generates thousands of observations per hour: service health checks passing, routine log rotations, scheduled backups completing.&lt;/p&gt;

&lt;p&gt;These observations need to exist for completeness, but they shouldn’t dominate memory retrieval when the agent needs to explain why it escalated a particular issue.&lt;/p&gt;

&lt;p&gt;An anomaly in error rates, however, should score significantly higher in importance.&lt;/p&gt;

&lt;p&gt;When the agent later retrieves memories to explain its decision to wake up the on-call engineer at 2 AM, it should prioritize the error rate anomaly over the 500 successful health checks that happened around the same time.&lt;/p&gt;

&lt;p&gt;Implementing importance scoring requires addressing a subtle challenge: importance is somewhat subjective and context-dependent.&lt;/p&gt;

&lt;p&gt;What’s important for customer service agents differs from what’s important for financial analysis agents.&lt;/p&gt;

&lt;p&gt;The Stanford team used a general-purpose importance prompt, but enterprise applications benefit from domain-specific calibration.&lt;/p&gt;

&lt;p&gt;Bedrock AgentCore’s built-in memory strategies implicitly capture importance through LLM-driven extraction and consolidation, rather than exposing an explicit importance scoring mechanism.&lt;/p&gt;

&lt;p&gt;When using the built-in strategies with customization, you can guide what the system considers important by adding domain-specific instructions via the appendToPrompt configuration field.&lt;/p&gt;

&lt;p&gt;For example, you might append “Focus on precedent-setting cases and landmark decisions” for a legal research agent, or “Prioritize executive contacts and decision-maker interactions” for a sales agent.&lt;/p&gt;

&lt;p&gt;The key architectural decision is when to calculate importance scores.&lt;/p&gt;

&lt;p&gt;The Stanford approach computed importance at memory creation time, which works well for most applications.&lt;/p&gt;

&lt;p&gt;The alternative (computing importance dynamically based on current context) offers more flexibility but increases computational overhead.&lt;/p&gt;

&lt;p&gt;For enterprise agents handling high-volume interactions, calculating importance once at storage time provides better cost/performance characteristics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Relevance Scoring: Context-Aware Memory Matching
&lt;/h2&gt;

&lt;p&gt;Recency tells us when something happened.&lt;/p&gt;

&lt;p&gt;Importance tells us how much it mattered.&lt;/p&gt;

&lt;p&gt;Relevance tells us whether it matters right now for the current situation.&lt;/p&gt;

&lt;p&gt;Without relevance scoring, agents retrieve memories that are recent and important but completely unrelated to the current task.&lt;/p&gt;

&lt;p&gt;The Stanford team implemented relevance through embedding similarity.&lt;/p&gt;

&lt;p&gt;Each memory gets encoded as a vector representation capturing its semantic content.&lt;/p&gt;

&lt;p&gt;When the agent needs to retrieve memories, it generates an embedding for the current query and calculates cosine similarity with all stored memories.&lt;/p&gt;

&lt;p&gt;Memories semantically related to the current context score higher regardless of how recently they occurred or their absolute importance.&lt;/p&gt;

&lt;p&gt;This approach enabled emergent behavior that felt genuinely intelligent.&lt;/p&gt;

&lt;p&gt;When agents engaged in domain-specific conversations (like political discussions), they retrieved memories about previous related conversations and relevant domain knowledge, not just whatever they’d been thinking about recently.&lt;/p&gt;

&lt;p&gt;The relevance scoring ensured contextually appropriate memories surfaced even if they weren’t the most recent or most important in absolute terms.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For enterprise applications, relevance scoring transforms agents from mechanical responders to context-aware assistants.&lt;br&gt;
A project management agent asked about budget status needs to retrieve financial memories, not schedule memories, even if scheduling happens more frequently or involves more important stakeholders.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The query context (”budget status”) should drive retrieval, not just temporal proximity or general importance.&lt;/p&gt;

&lt;p&gt;Implementation requires solving the embedding problem: how do you generate semantic representations that accurately capture the meaning of agent experiences?&lt;/p&gt;

&lt;p&gt;The Stanford team leveraged language model embeddings, which provide reasonable semantic similarity out of the box.&lt;/p&gt;

&lt;p&gt;Enterprise applications have three main options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, use general-purpose embeddings from foundation models like those available through Bedrock.&lt;/p&gt;

&lt;p&gt;These work well for most agent interactions but may miss domain-specific semantic relationships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, fine-tune embeddings on your specific domain to capture industry jargon and specialized concepts.&lt;/p&gt;

&lt;p&gt;This improves relevance scoring accuracy but requires investment in training data and model development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt;, use hybrid approaches that combine general embeddings with domain-specific metadata to enhance relevance without full fine-tuning.&lt;/p&gt;

&lt;p&gt;Bedrock AgentCore Memory uses semantic search with vector embeddings automatically. The built-in strategies handle embedding generation and similarity calculation without requiring manual configuration.&lt;/p&gt;

&lt;p&gt;When using built-in strategies with customization, you can select a different foundation model via the modelId configuration field if your domain benefits from a model with specialized training.&lt;/p&gt;

&lt;p&gt;For complete control over embedding strategies, you can implement self-managed memory strategies with custom embedding models.&lt;/p&gt;

&lt;p&gt;One critical implementation detail: relevance scoring requires formulating the right query.&lt;/p&gt;

&lt;p&gt;When an agent searches its memory, what query should generate the relevance embeddings?&lt;/p&gt;

&lt;p&gt;The Stanford approach used the agent’s current situation or question as the query.&lt;/p&gt;

&lt;p&gt;For enterprise agents, you might construct queries from multiple sources: the user’s current message, the agent’s current task, recent conversation context, or even the agent’s own reflection on what information it needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining Scores: The Retrieval Function Architecture
&lt;/h2&gt;

&lt;p&gt;Individual scoring dimensions solve specific problems, but agent behavior emerges from how scores combine.&lt;/p&gt;

&lt;p&gt;The Stanford team’s retrieval function weighted three dimensions equally: &lt;strong&gt;retrieval_score = recency + importance + relevance&lt;/strong&gt;, with each dimension normalized to [0,1] range using min-max scaling.&lt;/p&gt;

&lt;p&gt;This equally-weighted approach works surprisingly well as a starting point because each dimension captures fundamentally different information.&lt;/p&gt;

&lt;p&gt;Recency prevents over-reliance on old context.&lt;/p&gt;

&lt;p&gt;Importance prevents mundane noise from dominating.&lt;/p&gt;

&lt;p&gt;Relevance ensures contextual appropriateness.&lt;/p&gt;

&lt;p&gt;Together, they create a retrieval function that balances multiple concerns without requiring manual tuning.&lt;/p&gt;

&lt;p&gt;However, enterprise applications often benefit from adjusted weighting based on agent type and use case.&lt;/p&gt;

&lt;p&gt;A real-time monitoring agent might weight recency more heavily.&lt;/p&gt;

&lt;p&gt;What happened in the last five minutes matters more than what happened yesterday, regardless of importance or relevance.&lt;/p&gt;

&lt;p&gt;A research agent might weight relevance more heavily. Finding semantically related information matters more than when it was discovered or how important it seemed at the time.&lt;/p&gt;

&lt;p&gt;The math is the easy part:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;retrieval_score = w_recency × recency_score + w_importance × importance_score + w_relevance × relevance_score&lt;/strong&gt;, where the weights sum to 1.0.&lt;/p&gt;

&lt;p&gt;The challenge lies in determining appropriate weights for your specific application.&lt;/p&gt;

&lt;p&gt;Different agent types benefit from different weight profiles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversational agents&lt;/strong&gt; heavily favor recent context since conversation flow depends on immediate history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge agents&lt;/strong&gt; strongly favor relevance since finding the right information matters more than when it was learned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alert agents&lt;/strong&gt; heavily favor recency and importance since recent critical events drive alerting decisions.&lt;/p&gt;

&lt;p&gt;AgentCore’s built-in strategies handle these tradeoffs automatically through their consolidation algorithms rather than exposing explicit weight parameters.&lt;/p&gt;

&lt;p&gt;If you need fine-grained control over how recency, importance, and relevance combine in retrieval scoring, you would implement self-managed memory strategies with custom retrieval logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reflection: Synthesizing Memory Into Understanding
&lt;/h2&gt;

&lt;p&gt;Raw observations form the foundation of agent memory, but believable behavior requires higher-level understanding.&lt;/p&gt;

&lt;p&gt;The Stanford team introduced “reflection” as a mechanism for agents to periodically synthesize observations into broader insights about themselves, others, and their environment.&lt;/p&gt;

&lt;p&gt;Reflection generates a second type of memory that coexists with observations in the memory stream.&lt;/p&gt;

&lt;p&gt;These reflective memories don’t capture specific events.&lt;/p&gt;

&lt;p&gt;Instead, they capture patterns, relationships, and understanding derived from multiple events.&lt;/p&gt;

&lt;p&gt;When an agent reflects on observations about spending significant time on research activities and interactions with other researchers, it might generate the insight:&lt;/p&gt;

&lt;p&gt;“This agent is highly dedicated to research work.”&lt;/p&gt;

&lt;p&gt;This reflection itself becomes a memory that can be retrieved alongside observations.&lt;/p&gt;

&lt;p&gt;The power of reflection emerges when agents need to make decisions requiring synthesis.&lt;/p&gt;

&lt;p&gt;Without reflection, an agent’s decision about who to collaborate with depends on raw observation frequency.&lt;/p&gt;

&lt;p&gt;A colleague appears in more memories simply due to physical proximity (shared office space, common areas).&lt;/p&gt;

&lt;p&gt;With reflection, the agent retrieves synthesized understanding about shared professional interests, even though substantive interactions with that person appear less frequently than casual proximity encounters.&lt;/p&gt;

&lt;p&gt;For enterprise agents, reflection prevents a common failure mode: drowning in detail while missing the big picture.&lt;/p&gt;

&lt;p&gt;A customer service agent might observe 50 interactions with a particular customer across various issues: billing questions, technical problems, feature requests.&lt;/p&gt;

&lt;p&gt;Without reflection, the agent treats each interaction as independent.&lt;/p&gt;

&lt;p&gt;With reflection, the agent synthesizes: “This customer experiences recurring billing confusion despite multiple explanations, suggesting the billing interface itself may be unclear.”&lt;/p&gt;

&lt;p&gt;The Stanford implementation triggered reflection periodically based on experience accumulation.&lt;/p&gt;

&lt;p&gt;When the sum of importance scores for recent observations exceeded a threshold, the agent reflected.&lt;/p&gt;

&lt;p&gt;This approach ensures reflection happens when agents have sufficient new experiences to warrant synthesis while avoiding constant reflection on minor observations.&lt;/p&gt;

&lt;p&gt;The threshold value determines reflection frequency: lower thresholds mean more frequent reflection (which can generate noise), higher thresholds mean agents accumulate more experiences before synthesizing (which requires sufficient important events to cross the threshold).&lt;/p&gt;

&lt;p&gt;Reflection generation involves three steps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, identify salient questions based on recent experiences.&lt;/p&gt;

&lt;p&gt;The agent prompts itself: “Given these recent observations, what are the most important questions I can answer about myself or my environment?”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, retrieve relevant memories for each question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third&lt;/strong&gt;, synthesize insights that answer those questions, citing specific observations as supporting evidence.&lt;/p&gt;

&lt;p&gt;Bedrock AgentCore implements reflection through its Episodic Memory Strategy.&lt;/p&gt;

&lt;p&gt;Episodic memory operates on a per-session basis, with reflections synthesized from episodes within the same interaction context rather than across arbitrary sessions.&lt;/p&gt;

&lt;p&gt;This strategy captures interactions as structured episodes with intents, actions, and outcomes, then generates cross-episode reflections that synthesize broader insights.&lt;/p&gt;

&lt;p&gt;The episodic strategy uses namespaces to organize both individual episodes and the reflections derived from them.&lt;/p&gt;

&lt;p&gt;When using built-in strategies with customization, you can guide reflection behavior through the appendToPrompt configuration field to focus synthesis on patterns relevant to your domain.&lt;/p&gt;

&lt;p&gt;For example, you might append instructions like “When reflecting, focus on recurring customer pain points and opportunities for process improvement.”&lt;/p&gt;

&lt;p&gt;The built-in episodic strategy handles reflection timing automatically based on accumulated experiences.&lt;/p&gt;

&lt;p&gt;For complete control over reflection triggers, frequency, and synthesis logic, you would implement a self-managed memory strategy with custom algorithms.&lt;/p&gt;

&lt;p&gt;Reflection also enables recursion: agents can reflect on their own reflections.&lt;/p&gt;

&lt;p&gt;An agent might observe multiple experiences around a specific work pattern, reflect on that pattern, then later reflect on multiple patterns together to synthesize higher-level understanding.&lt;/p&gt;

&lt;p&gt;This hierarchical reflection creates increasingly abstract understanding that guides high-level decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AgentCore Actually Implements Memory
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock AgentCore takes a different architectural approach than the Stanford research paper.&lt;/p&gt;

&lt;p&gt;Rather than manually scoring memories across recency-importance-relevance dimensions, AgentCore provides two complementary memory types that automate much of this complexity:&lt;/p&gt;

&lt;h3&gt;
  
  
  AgentCore’s Two-Tier Memory System
&lt;/h3&gt;

&lt;p&gt;Short-term memory stores raw interactions within a single session as events. Each event captures conversational exchanges, instructions, or structured information such as product details or order status.&lt;br&gt;
Events persist for a configurable retention period and can be retrieved later within the same actor and session scope, enabling controlled continuation of context without merging unrelated sessions.&lt;/p&gt;

&lt;p&gt;You can attach metadata to events for quick filtering without scanning full session history.&lt;/p&gt;

&lt;p&gt;Long-term memory automatically extracts and stores structured insights from interactions.&lt;/p&gt;

&lt;p&gt;After events are created, AgentCore asynchronously processes them to extract facts, preferences, knowledge, and session summaries.&lt;/p&gt;

&lt;p&gt;These consolidated insights persist across multiple sessions and enable personalization without requiring customers to repeat information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Search vs. Retrieval Scoring
&lt;/h3&gt;

&lt;p&gt;AgentCore’s RetrieveMemoryRecords operation performs semantic search to find memories most relevant to the current query.&lt;/p&gt;

&lt;p&gt;This differs from the Stanford approach where you explicitly configure recency, importance, and relevance weights.&lt;/p&gt;

&lt;p&gt;AgentCore handles relevance through embeddings automatically, while recency and importance are implicit in how it processes and consolidates long-term memories.&lt;/p&gt;

&lt;h3&gt;
  
  
  Episodic Memory for Learning
&lt;/h3&gt;

&lt;p&gt;AgentCore Memory includes an episodic memory strategy, enabling agents to learn and adapt from experiences over time.&lt;/p&gt;

&lt;p&gt;This builds knowledge that makes interactions more humanlike, similar to the reflection mechanisms described in the Stanford research.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Memory Strategies in AgentCore
&lt;/h2&gt;

&lt;p&gt;AgentCore provides built-in memory strategies that handle extraction, consolidation, and retrieval automatically.&lt;/p&gt;

&lt;p&gt;Understanding how to configure these strategies helps you build agents with effective memory behavior without implementing Stanford-style scoring from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Memory Strategies
&lt;/h3&gt;

&lt;p&gt;AgentCore provides four built-in strategies that automatically extract and organize different types of information from agent interactions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User Preference Strategy&lt;/strong&gt;: Automatically identifies and extracts user preferences, choices, and styles. Useful for e-commerce agents that need to remember customer preferences like favorite brands, sizes, or shopping habits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic Memory Strategy&lt;/strong&gt;: Extracts key factual information and contextual knowledge using vector embeddings for similarity-based retrieval. Prevents agents from repeatedly asking for information users already provided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary Memory Strategy&lt;/strong&gt;: Creates condensed summaries of conversations within a session, reducing the need to process entire conversation histories for context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Episodic Memory Strategy&lt;/strong&gt;: Captures interactions as structured episodes with intents, actions, and outcomes. Includes cross-episode reflection capabilities that synthesize broader insights across multiple interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Customizing Built-in Strategies
&lt;/h3&gt;

&lt;p&gt;AgentCore allows two levels of customization for built-in strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt Customization&lt;/strong&gt;: Use the appendToPrompt configuration field to add domain-specific instructions that guide what the strategy extracts and how it prioritizes information. For example, a legal research agent might add instructions to focus on precedent-setting cases and landmark decisions, while prioritizing regulatory changes and compliance requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Selection&lt;/strong&gt;: Choose a different foundation model via the modelId field if your domain benefits from specialized model capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Retrieval and Filtering
&lt;/h3&gt;

&lt;p&gt;When retrieving memories, AgentCore uses semantic search with vector embeddings to find the most relevant information. You can control retrieval behavior through several parameters:&lt;/p&gt;

&lt;p&gt;Namespace filtering: Organize memories hierarchically using namespace patterns like /users/{actorId}/preferences or /support_cases/{sessionId}/facts, then filter retrieval to specific namespaces.&lt;/p&gt;

&lt;p&gt;Top-k limiting: Specify how many memory records to retrieve (balancing context richness against processing costs).&lt;/p&gt;

&lt;p&gt;Event retention: Configure how long raw conversation events persist (up to 365 days) before automatic expiration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing Stanford-Style Explicit Scoring
&lt;/h3&gt;

&lt;p&gt;If you need explicit control over recency-importance-relevance weighting like the Stanford approach, you can implement self-managed memory strategies.&lt;/p&gt;

&lt;p&gt;Self-managed strategies give you complete control over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom extraction and consolidation algorithms&lt;/li&gt;
&lt;li&gt;Manual scoring across any dimensions you define&lt;/li&gt;
&lt;li&gt;Integration with external memory systems&lt;/li&gt;
&lt;li&gt;Custom retrieval logic with explicit weight configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Self-managed strategies require infrastructure setup (S3 buckets for payloads, SNS topics for notifications, IAM roles for access) and ongoing maintenance of the memory processing pipeline. This approach makes sense when your memory requirements differ significantly from what the built-in strategies provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Measuring Memory Strategy Effectiveness
&lt;/h2&gt;

&lt;p&gt;Implementing memory strategies is only valuable if they improve agent behavior.&lt;/p&gt;

&lt;p&gt;The Stanford research evaluated memory effectiveness through believability ratings and behavioral coherence. Enterprise applications require measurable metrics tied to business outcomes and concrete measurement procedures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval Quality Metrics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Retrieval relevance&lt;/strong&gt; measures whether retrieved memories actually contribute to response quality.&lt;/p&gt;

&lt;p&gt;Implementation requires weekly sampling of 50-100 agent interactions where you examine the retrieved memories and the agent’s response.&lt;/p&gt;

&lt;p&gt;For each interaction, have domain experts rate each retrieved memory as relevant (contributed to response), partially relevant (provided context but not directly used), or irrelevant (unrelated to query).&lt;/p&gt;

&lt;p&gt;Calculate the percentage of relevant memories in the top-10 retrieved results.&lt;/p&gt;

&lt;p&gt;Target &amp;gt;80% relevance.&lt;/p&gt;

&lt;p&gt;Log retrieval inputs/outputs (query, retrieved record IDs/namespaces, and the final response) to S3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Score distribution&lt;/strong&gt; reveals whether agents balance retrieval dimensions appropriately. Use CloudWatch Logs Insights to calculate mean scores across dimensions for all retrievals in a time period.&lt;/p&gt;

&lt;p&gt;For agents implementing explicit retrieval scoring (for example, with self-managed memory strategies), balanced systems tend to show similar mean values across recency, importance, and relevance after normalization.&lt;/p&gt;

&lt;p&gt;Agents over-relying on one dimension show skewed distributions.&lt;/p&gt;

&lt;p&gt;For example, if average recency scores are 0.85 while importance and relevance average 0.15 and 0.20, the agent depends too heavily on recency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Citation usage&lt;/strong&gt; tracks whether agents incorporate retrieved memories into responses or fall back on generic knowledge.&lt;/p&gt;

&lt;p&gt;Implement by parsing agent responses for memory citations or references to past events.&lt;/p&gt;

&lt;p&gt;If your agent implementation tracks which memories influenced each response, calculate what percentage of retrieved memories actually get cited.&lt;/p&gt;

&lt;p&gt;Target &amp;gt;60% citation rate, which indicates retrieval is surfacing useful context rather than noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Behavioral Coherence Metrics
&lt;/h3&gt;

&lt;p&gt;Self-contradiction rate requires comparing agent statements against stored memories to detect logical inconsistencies.&lt;/p&gt;

&lt;p&gt;Implement through periodic automated checks that use language models to detect contradictions.&lt;/p&gt;

&lt;p&gt;For a sample of agent responses (start with 10%), retrieve similar memories and prompt a language model to identify whether the current statement contradicts any previous statements.&lt;/p&gt;

&lt;p&gt;Track contradictions per 100 interactions with a target of less than 2% contradiction rate.&lt;/p&gt;

&lt;p&gt;Context awareness measures whether agents incorporate relevant historical context without explicit prompting. Implement through test scenarios where context should influence responses.&lt;/p&gt;

&lt;p&gt;Create test cases with historical context stored in memory, then issue queries that should trigger context usage.&lt;/p&gt;

&lt;p&gt;Use language model evaluation to assess whether agent responses appropriately incorporate the historical context.&lt;/p&gt;

&lt;p&gt;Target &amp;gt;90% context awareness across your test scenarios.&lt;/p&gt;

&lt;p&gt;Decision consistency tracks whether agents make similar decisions in similar situations. Implement by identifying repeated scenario types (like billing disputes with similar characteristics) and comparing agent actions.&lt;/p&gt;

&lt;p&gt;Group scenarios by similarity using embedding-based clustering, then calculate what percentage of similar scenarios receive consistent decisions.&lt;/p&gt;

&lt;p&gt;Target &amp;gt;85% consistency for equivalent situations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Impact Metrics
&lt;/h3&gt;

&lt;p&gt;Task completion rate compares before/after memory strategy implementation by tracking multi-step task success.&lt;/p&gt;

&lt;p&gt;Use CloudWatch Logs Insights to analyze task outcomes, filtering for completed versus failed or abandoned tasks.&lt;/p&gt;

&lt;p&gt;Compare completion rates between different memory strategy versions, along with average time to completion and number of memory retrievals required.&lt;/p&gt;

&lt;p&gt;This reveals whether improved memory strategies help agents complete tasks more effectively.&lt;/p&gt;

&lt;p&gt;User satisfaction correlation with retrieval quality requires instrumenting feedback collection and linking to retrieval performance.&lt;/p&gt;

&lt;p&gt;For interactions where users provide satisfaction ratings, calculate retrieval quality metrics (average retrieval score, citation rate, memory count) and analyze the correlation with satisfaction scores.&lt;/p&gt;

&lt;p&gt;High correlation (&amp;gt;0.6) between retrieval quality and satisfaction indicates that memory strategy improvements translate to better user experience.&lt;/p&gt;

&lt;p&gt;Efficiency gains measure whether better memory reduces interaction time or redundant questions.&lt;/p&gt;

&lt;p&gt;Track average interaction duration, conversation turns, and redundant questions (asking for information already provided in the session) across different memory strategy versions.&lt;/p&gt;

&lt;p&gt;Target a &amp;gt;50% reduction in redundant questions with proper memory retrieval, which demonstrates that agents effectively use stored context instead of repeatedly requesting the same information.&lt;/p&gt;

&lt;p&gt;Start with manual sampling for retrieval relevance and context awareness to establish baselines, then automate contradiction detection and decision consistency tracking as you scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patterns From Production: Memory Strategy Lessons
&lt;/h2&gt;

&lt;p&gt;Organizations implementing sophisticated memory strategies with AgentCore have discovered patterns that extend beyond the Stanford research findings:&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain-Specific Importance Calibration
&lt;/h3&gt;

&lt;p&gt;Generic importance scoring works reasonably well, but domain-specific calibration significantly improves retrieval quality.&lt;/p&gt;

&lt;p&gt;Implementation approach: Create a set of 20-50 representative memories spanning the importance spectrum for your domain.&lt;/p&gt;

&lt;p&gt;Use these as few-shot examples in the importance scoring prompt.&lt;/p&gt;

&lt;p&gt;Periodically review whether importance scores align with domain expert judgment and refine examples accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Temporal Context Matters
&lt;/h3&gt;

&lt;p&gt;The Stanford research used hourly intervals as time units because their simulation tracked agents through daily routines with clear temporal structure.&lt;/p&gt;

&lt;p&gt;Enterprise agents operate across varying temporal scales that affect optimal recency decay rates.&lt;/p&gt;

&lt;p&gt;Real-time monitoring agents need aggressive decay (half-life measured in minutes) because events from an hour ago rarely remain relevant.&lt;/p&gt;

&lt;p&gt;Customer support agents need moderate decay (half-life measured in hours) because conversations span multiple interactions but complete within a day.&lt;/p&gt;

&lt;p&gt;Account management agents need gentle decay (half-life measured in weeks) because relationships and context accumulate over months.&lt;/p&gt;

&lt;p&gt;Implementation approach: Start with medium decay rates (half-life of 8 hours for session memory, 30 days for long-term memory), then adjust based on observed retrieval patterns.&lt;/p&gt;

&lt;p&gt;If agents over-rely on old context, increase decay rate. If they miss relevant historical context, decrease decay rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reflection Quality Over Frequency
&lt;/h3&gt;

&lt;p&gt;Early AgentCore implementations often triggered reflection too frequently, generating noisy low-quality insights.&lt;/p&gt;

&lt;p&gt;High-quality reflection requires sufficient accumulated experience to identify genuine patterns rather than noise.&lt;/p&gt;

&lt;p&gt;Frequent reflection on sparse data produces observations dressed as insights (”The customer uses our product” rather than “The customer consistently struggles with feature X despite multiple explanations”).&lt;/p&gt;

&lt;p&gt;Implementation approach: Set reflection thresholds high enough that agents accumulate 20-30 meaningful observations before reflecting.&lt;/p&gt;

&lt;p&gt;Monitor reflection content quality manually.&lt;/p&gt;

&lt;p&gt;Good reflections synthesize patterns across multiple observations and provide actionable insights.&lt;/p&gt;

&lt;p&gt;Poor reflections restate individual observations or make unsupported generalizations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hybrid Memory Architecture
&lt;/h3&gt;

&lt;p&gt;Pure episodic memory (observations and reflections) works well for Stanford’s simulation, but enterprise agents benefit from hybrid architectures combining episodic memory with semantic knowledge bases and procedural knowledge.&lt;/p&gt;

&lt;p&gt;A healthcare agent combines episodic memory (patient interaction history) with semantic memory (medical knowledge base) and procedural memory (clinical protocols).&lt;/p&gt;

&lt;p&gt;Retrieval strategies differ across memory types: episodic memory uses recency-importance-relevance scoring, semantic memory uses pure relevance scoring, procedural memory uses task-specific rule matching.&lt;/p&gt;

&lt;p&gt;Implementation approach: Use AgentCore session and long-term memory for episodic storage with full retrieval scoring. Integrate knowledge bases through retrieval-augmented generation (RAG) with relevance-only scoring.&lt;/p&gt;

&lt;p&gt;Implement procedural knowledge through explicit skill definitions that bypass memory retrieval entirely for deterministic tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Agents That Learn From Experience
&lt;/h2&gt;

&lt;p&gt;The Stanford Generative Agents research proved that sophisticated memory strategies transform language model behavior from reactive to genuinely autonomous. Agents with proper memory retrieval, importance scoring, and reflection capabilities develop coherent personalities, form relationships, and exhibit emergent behaviors that feel believable rather than mechanical.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock AgentCore provides production-ready memory infrastructure through its two-tier system: short-term memory for session context and long-term memory for automatic insight extraction.&lt;/p&gt;

&lt;p&gt;While AgentCore’s semantic search approach differs from the Stanford paper’s explicit recency-importance-relevance scoring, both architectures solve the same fundamental problem: helping agents retrieve the right context at the right time.&lt;/p&gt;

&lt;p&gt;Organizations implementing sophisticated memory strategies report measurably better agent performance: higher task completion rates, improved user satisfaction, reduced interaction time, and fewer behavioral inconsistencies.&lt;/p&gt;

&lt;p&gt;More importantly, they report agents that feel less like chatbots and more like assistants that genuinely understand context and learn from experience.&lt;/p&gt;

&lt;p&gt;Whether you adopt AgentCore’s automatic extraction and semantic search or implement explicit retrieval scoring based on the Stanford research, the core principle remains the same: believable agents need memory systems that distinguish important from mundane, recent from historical, and relevant from tangential.&lt;/p&gt;

&lt;p&gt;These capabilities are available today for organizations ready to move beyond stateless chat interfaces toward agents that remember, reflect, and improve.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I publish every week at &lt;a href="https://buildwithaws.substack.com/" rel="noopener noreferrer"&gt;buildwithaws.substack.com&lt;/a&gt;. Subscribe. It's free.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>agents</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Route Claude Code Through AWS Bedrock for CloudTrail Auditing and IAM Control</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Tue, 24 Mar 2026 12:02:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/route-claude-code-through-aws-bedrock-for-cloudtrail-auditing-and-iam-control-4d39</link>
      <guid>https://dev.to/aws-builders/route-claude-code-through-aws-bedrock-for-cloudtrail-auditing-and-iam-control-4d39</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://buildwithaws.substack.com/" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Over the past few weeks, Claude Code has gained a lot of attention as a developer tool in the AI space.&lt;/p&gt;

&lt;p&gt;With rapid improvements in its capabilities, better context handling, and an increasingly robust feature set, developers are flocking to this powerful CLI tool that brings Claude’s intelligence directly into their terminal workflow.&lt;/p&gt;

&lt;p&gt;Whether you’re debugging complex codebases, refactoring legacy systems, or building new features, Claude Code has proven itself as an indispensable coding companion. But with great power comes great responsibility, and potentially significant API costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Route Claude Code Through AWS Bedrock?
&lt;/h2&gt;

&lt;p&gt;If you’re already using Claude Code, you might be consuming the Anthropic API directly.&lt;/p&gt;

&lt;p&gt;While this works perfectly fine, there are compelling reasons to route your Claude Code traffic through AWS Bedrock instead:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Cost Control and Transparency
&lt;/h3&gt;

&lt;p&gt;AWS Bedrock provides granular billing through AWS Cost Explorer.&lt;/p&gt;

&lt;p&gt;You can track AI spending alongside your other AWS services, set up billing alerts and budgets, and analyze usage patterns with detailed metrics.&lt;/p&gt;

&lt;p&gt;This visibility enables better cost management compared to direct API billing.&lt;/p&gt;

&lt;p&gt;AWS enterprise customers can also take advantage of committed use pricing and volume discounts that apply across their entire AWS footprint, potentially reducing AI infrastructure costs significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Security and Compliance
&lt;/h3&gt;

&lt;p&gt;For enterprises and security-conscious teams, Bedrock offers substantial advantages. Requests are made to Bedrock under your AWS account with IAM governance, CloudTrail auditing, and optional PrivateLink connectivity.&lt;/p&gt;

&lt;p&gt;This provides complete visibility into who invoked which models and when, helping meet compliance requirements that mandate audit trails and access controls.&lt;/p&gt;

&lt;p&gt;Every API call gets logged through CloudTrail, and you can leverage AWS IAM for fine-grained access control.&lt;/p&gt;

&lt;p&gt;Organizations can also use AWS PrivateLink to keep API traffic off the public internet, simplifying governance and network security posture.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Observability
&lt;/h3&gt;

&lt;p&gt;Bedrock integration provides comprehensive observability through CloudWatch metrics that track invocation counts, latency, and errors.&lt;/p&gt;

&lt;p&gt;CloudTrail logs capture complete audit trails of every model invocation.&lt;/p&gt;

&lt;p&gt;You can integrate these logs with your existing AWS monitoring stack, whether that’s CloudWatch dashboards, third-party tools, or custom alerting systems.&lt;/p&gt;

&lt;p&gt;This allows you to set up alerts on usage patterns, detect anomalies, and troubleshoot issues using the same tools you already use for your AWS infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Unified Cloud Strategy
&lt;/h3&gt;

&lt;p&gt;Organizations already running infrastructure on AWS gain additional benefits from using Bedrock. Centralized billing consolidates AI costs with compute, storage, and other services, simplifying cost allocation and budgeting.&lt;/p&gt;

&lt;p&gt;You get a single pane of glass for all cloud services rather than managing multiple vendor relationships.&lt;/p&gt;

&lt;p&gt;This simplifies vendor management and allows you to leverage existing AWS support contracts and enterprise agreements for your AI infrastructure as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Configuration Process
&lt;/h2&gt;

&lt;p&gt;The good news?&lt;/p&gt;

&lt;p&gt;Configuring Claude Code to use Bedrock is remarkably straightforward.&lt;/p&gt;

&lt;p&gt;The changes are global, affecting all your projects and sessions once configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before you begin, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI installed and configured with valid credentials&lt;/li&gt;
&lt;li&gt;Claude Code CLI installed (recent version recommended)&lt;/li&gt;
&lt;li&gt;AWS Bedrock model access enabled in your target region via the Bedrock console (some models require approval depending on region and account type)&lt;/li&gt;
&lt;li&gt;Appropriate IAM permissions for bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, and bedrock:ListInferenceProfiles&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Verify AWS Credentials
&lt;/h3&gt;

&lt;p&gt;First, confirm your AWS CLI is properly configured:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;aws sts get-caller-identity&lt;/code&gt;&lt;br&gt;
You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"UserId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AIDAXXXXXXXXXXXXXXXXX"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Account"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123456789012"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Arn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:user/your-username"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Set Environment Variables
&lt;/h3&gt;

&lt;p&gt;The configuration happens through environment variables.&lt;/p&gt;

&lt;p&gt;Add these to your shell configuration file (~/.zshrc, ~/.bashrc, or ~/.bash_profile):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Bedrock for Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_CODE_USE_BEDROCK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1

&lt;span class="c"&gt;# Set your preferred AWS region (REQUIRED - Claude Code does not read from ~/.aws/config)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding these lines, reload your shell configuration:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;source ~/.zshrc  # or ~/.bashrc&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Verify the Configuration
&lt;/h3&gt;

&lt;p&gt;Check that the environment variables are set:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;env | grep -E "CLAUDE_CODE_USE_BEDROCK|AWS_REGION"&lt;/code&gt;&lt;br&gt;
Expected output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;CLAUDE_CODE_USE_BEDROCK&lt;/span&gt;=&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;=&lt;span class="n"&gt;us&lt;/span&gt;-&lt;span class="n"&gt;east&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it! No per-project configuration needed.&lt;/p&gt;

&lt;p&gt;These environment variables tell Claude Code to route all LLM requests through AWS Bedrock’s API instead of directly to Anthropic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Scope
&lt;/h2&gt;

&lt;p&gt;Important: This configuration is global and session-based, not project-specific.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Affects all Claude Code sessions started after setting the variables&lt;/li&gt;
&lt;li&gt;✅ Works across all directories and projects&lt;/li&gt;
&lt;li&gt;✅ No need to configure individual projects&lt;/li&gt;
&lt;li&gt;⚠️ Only applies to new terminal sessions (existing sessions need to be restarted)&lt;/li&gt;
&lt;li&gt;⚠️ If you unset the variables, Claude Code reverts to direct Anthropic API usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You do not need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add configuration files to each project&lt;/li&gt;
&lt;li&gt;Modify any project-specific settings&lt;/li&gt;
&lt;li&gt;Change your Claude Code commands or workflow&lt;/li&gt;
&lt;li&gt;Update your .claude.json file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The environment variables are detected automatically when Claude Code initializes, and all API traffic is transparently routed through Bedrock.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verification Methods: Proving It Works
&lt;/h2&gt;

&lt;p&gt;Now comes the crucial part: verifying that your configuration is actually working and that you’re being charged through AWS Bedrock instead of the Anthropic API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 1: Environment Variable Check (Quick Verification)
&lt;/h3&gt;

&lt;p&gt;While Claude Code is running, verify the environment:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;env | grep -E "CLAUDE_CODE_USE_BEDROCK|AWS_REGION"&lt;/code&gt;&lt;br&gt;
You should see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;CLAUDE_CODE_USE_BEDROCK&lt;/span&gt;=&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;AWS_REGION&lt;/span&gt;=&lt;span class="n"&gt;us&lt;/span&gt;-&lt;span class="n"&gt;east&lt;/span&gt;-&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are the only two variables that enable Bedrock routing.&lt;/p&gt;

&lt;p&gt;You still need valid AWS credentials (default or via AWS_PROFILE/SSO).&lt;/p&gt;

&lt;p&gt;For definitive verification, use CloudTrail logs (Method 2 below).&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 2: CloudTrail Audit Logs (Definitive Proof)
&lt;/h3&gt;

&lt;p&gt;This is the most reliable verification method. CloudTrail logs every Bedrock API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check for Bedrock API calls from your user in the last hour&lt;/span&gt;
&lt;span class="c"&gt;# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"&lt;/span&gt;
aws cloudtrail lookup-events &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lookup-attributes&lt;/span&gt; &lt;span class="nv"&gt;AttributeKey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Username,AttributeValue&lt;span class="o"&gt;=&lt;/span&gt;your-iam-username &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-1H&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Events[?contains(EventSource, `bedrock`)].[EventTime,EventName,EventSource]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: If you use assumed roles or AWS SSO, the Username filter may not work.&lt;/p&gt;

&lt;p&gt;In that case, filter by EventSource only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws cloudtrail lookup-events &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-1H&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Events[?contains(EventSource, `bedrock`)].[EventTime,EventName,EventSource]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; table
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If Claude Code is using Bedrock, you’ll see InvokeModel or InvokeModelWithResponseStream events (streaming sessions typically use the latter):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|  2026-01-14T11:05:48|InvokeModelWithResponseStream|bedrock.aws...  |
|  2026-01-14T11:04:23|InvokeModelWithResponseStream|bedrock.aws...  |
|  2026-01-14T11:04:21|InvokeModelWithResponseStream|bedrock.aws...  |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To extract the specific models being invoked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"&lt;/span&gt;
aws cloudtrail lookup-events &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lookup-attributes&lt;/span&gt; &lt;span class="nv"&gt;AttributeKey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Username,AttributeValue&lt;span class="o"&gt;=&lt;/span&gt;your-iam-username &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-1H&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Events[?contains(EventName, `InvokeModel`)] | [0:3]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; json | &lt;span class="se"&gt;\&lt;/span&gt;
  python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
import sys, json
events = json.load(sys.stdin)
for e in events:
    details = json.loads(e['CloudTrailEvent'])
    model = details.get('requestParameters', {}).get('modelId', 'N/A')
    print(f&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Time: {e['EventTime']}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;)
    print(f&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Model: {model}&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;)
    print('---')
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Depending on the event shape, the model identifier may appear under requestParameters.modelId or a related field.&lt;/p&gt;

&lt;p&gt;Expected output showing Claude models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Time: 2026-01-14T11:05:48-03:00
Model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
---
Time: 2026-01-14T11:04:23-03:00
Model: us.anthropic.claude-haiku-4-5-20251001-v1:0
---
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Model IDs may vary depending on your configuration.&lt;/p&gt;

&lt;p&gt;The default primary model is global.anthropic.claude-sonnet-4-5-20250929-v1:0, but regional inference profiles (like us.anthropic...) may also appear based on your setup. Both indicate Bedrock usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 3: Count API Calls
&lt;/h3&gt;

&lt;p&gt;Get a quick count of how many Bedrock calls you’ve made:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"&lt;/span&gt;
aws cloudtrail lookup-events &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--lookup-attributes&lt;/span&gt; &lt;span class="nv"&gt;AttributeKey&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Username,AttributeValue&lt;span class="o"&gt;=&lt;/span&gt;your-iam-username &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-1H&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Events[?contains(EventName, `InvokeModel`)]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; json | &lt;span class="se"&gt;\&lt;/span&gt;
  python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys, json; print(f'Total Bedrock API calls: {len(json.load(sys.stdin))}')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 4: CloudWatch Metrics
&lt;/h3&gt;

&lt;p&gt;Check aggregated metrics for specific models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Note: For Linux, replace "date -u -v-1d" with "date -u -d '1 day ago'"&lt;/span&gt;
aws cloudwatch get-metric-statistics &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; AWS/Bedrock &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metric-name&lt;/span&gt; Invocations &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dimensions&lt;/span&gt; &lt;span class="nv"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ModelId,Value&lt;span class="o"&gt;=&lt;/span&gt;us.anthropic.claude-sonnet-4-5-20250929-v1:0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--start-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="nt"&gt;-v-1d&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--end-time&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%dT%H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--period&lt;/span&gt; 3600 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--statistics&lt;/span&gt; Sum &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output shows invocation counts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Invocations"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Datapoints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-14T13:07:00+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Sum"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;19.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Unit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Count"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 5: AWS Cost Explorer (Delayed, but Comprehensive)
&lt;/h3&gt;

&lt;p&gt;Check your Bedrock costs through Cost Explorer. Note that costs typically appear with a 24-48 hour delay:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Note: For Linux, replace "date -v-2d" with "date -d '2 days ago'"&lt;/span&gt;
aws ce get-cost-and-usage &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--time-period&lt;/span&gt; &lt;span class="nv"&gt;Start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;-v-2d&lt;/span&gt; +%Y-%m-%d&lt;span class="si"&gt;)&lt;/span&gt;,End&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y-%m-%d&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--granularity&lt;/span&gt; DAILY &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--metrics&lt;/span&gt; UnblendedCost &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-by&lt;/span&gt; &lt;span class="nv"&gt;Type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;DIMENSION,Key&lt;span class="o"&gt;=&lt;/span&gt;SERVICE &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filter&lt;/span&gt; &lt;span class="s1"&gt;'{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Bedrock"]}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 6: Check Anthropic Console (Negative Verification)
&lt;/h3&gt;

&lt;p&gt;As a final check, log into your Anthropic console at&lt;/p&gt;

&lt;p&gt;&lt;a href="https://console.anthropic.com" rel="noopener noreferrer"&gt;https://console.anthropic.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and check your API usage dashboard. If you see no recent API calls corresponding to your Claude Code sessions, it confirms traffic is going through Bedrock instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;If verification shows no Bedrock traffic:&lt;/p&gt;

&lt;p&gt;Check environment variables in the active session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$CLAUDE_CODE_USE_BEDROCK&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your terminal after setting environment variables&lt;/p&gt;

&lt;p&gt;Verify AWS credentials are valid:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;aws sts get-caller-identity&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check IAM permissions for bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream actions&lt;/li&gt;
&lt;li&gt;Ensure Bedrock model access is enabled in AWS Console (us-east-1 → Bedrock → Model Access)&lt;/li&gt;
&lt;li&gt;Review CloudTrail for AccessDenied events that might indicate permission issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cost Implications
&lt;/h2&gt;

&lt;p&gt;Bedrock pricing for Anthropic models has two distinct tiers depending on model generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Legacy models (Public Extended Access)
&lt;/h3&gt;

&lt;p&gt;Claude 3.5 Sonnet moved to Public Extended Access pricing as of December 2025, increasing from $3/$15 to $6/$30 per million tokens. If you are still running workloads on these older models, migrating to Claude Sonnet 4.5 gives you better performance at a lower price point.&lt;/p&gt;

&lt;p&gt;Claude 3.5 Sonnet v2 (also under Public Extended Access) is priced the same at $6.00 input / $30.00 output per million tokens on-demand, with batch at $3.00 / $15.00. It additionally supports prompt caching: $7.50 per million for cache writes and $0.60 per million for cache reads. &lt;/p&gt;

&lt;h3&gt;
  
  
  Current generation models
&lt;/h3&gt;

&lt;p&gt;Claude Sonnet 4.5 on Bedrock is priced at $3.00 per million input tokens and $15.00 per million output tokens in us-east-1. This is significantly cheaper than the legacy Sonnet 3.5 extended access pricing for equivalent capability.&lt;/p&gt;

&lt;p&gt;Starting with Claude Sonnet 4.5 and Haiku 4.5, AWS Bedrock offers two endpoint types: global endpoints for dynamic routing across regions, and regional endpoints with a 10% premium for data residency requirements. &lt;/p&gt;

&lt;p&gt;For exact Haiku 4.5 and Opus 4.5 pricing, check the AWS Bedrock console directly as rates can vary by region and are updated more frequently than third-party guides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pricing modes that affect your bill
&lt;/h3&gt;

&lt;p&gt;All current Claude models support batch inference at a 50% discount, useful for asynchronous workloads like document processing or data enrichment where real-time responses are not required.&lt;/p&gt;

&lt;p&gt;Prompt caching can reduce costs substantially for workloads that reuse the same context repeatedly. The 1-hour TTL option for prompt caching launched in January 2026 for Claude Sonnet 4.5, Haiku 4.5, and Opus 4.5.&lt;/p&gt;

&lt;p&gt;Intelligent Prompt Routing can automatically route requests between models in the same family based on prompt complexity, reducing costs by up to 30% without compromising accuracy. This works well for customer service workloads where simple queries can be handled by a smaller model and complex ones escalated automatically.&lt;/p&gt;

&lt;p&gt;Always verify current rates at aws.amazon.com/bedrock/pricing before budgeting, as prices vary by region and are updated periodically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Taking Control of Your AI Infrastructure
&lt;/h2&gt;

&lt;p&gt;Routing Claude Code through AWS Bedrock provides tangible benefits in cost control, security, and observability without adding complexity to your workflow.&lt;/p&gt;

&lt;p&gt;The configuration is global, simple, and transparent to your development process.&lt;/p&gt;

&lt;p&gt;The verification methods outlined above give you confident confirmation that your AI traffic flows through Bedrock, allowing you to take advantage of AWS’s robust cloud infrastructure for your AI workloads.&lt;/p&gt;

&lt;p&gt;CloudTrail audit logs provide irrefutable proof of where your API calls are going.&lt;/p&gt;

&lt;p&gt;As Claude Code continues to evolve and become more central to development workflows, having this level of control and visibility over your AI infrastructure becomes increasingly valuable.&lt;/p&gt;

&lt;p&gt;The ability to audit, monitor, and manage AI costs through the same tools you use for the rest of your infrastructure creates operational efficiency that compounds over time.&lt;/p&gt;

&lt;p&gt;Have you configured Claude Code with Bedrock? What benefits have you seen? Share your experience in the comments below.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I publish every week at &lt;a href="https://buildwithaws.substack.com/" rel="noopener noreferrer"&gt;buildwithaws.substack.com&lt;/a&gt;. Subscribe. It's free.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>tutorial</category>
      <category>claudecode</category>
      <category>security</category>
    </item>
    <item>
      <title>What a Multimodal WhatsApp Agent Looks Like on AWS</title>
      <dc:creator>Marcelo Acosta Cavalero</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-a-multimodal-whatsapp-agent-looks-like-on-aws-39p0</link>
      <guid>https://dev.to/aws-builders/what-a-multimodal-whatsapp-agent-looks-like-on-aws-39p0</guid>
      <description>&lt;p&gt;Originally published on &lt;a href="https://buildwithaws.substack.com/" rel="noopener noreferrer"&gt;Build With AWS&lt;/a&gt;. Subscribe for weekly AWS builds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdlw6jkp770hhm3uzdvu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdlw6jkp770hhm3uzdvu.png" alt="AWS Agentic Architectures" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I watched &lt;a href="https://theneuralmaze.substack.com/" rel="noopener noreferrer"&gt;Miguel Otero Pedrido&lt;/a&gt; and &lt;a href="https://www.youtube.com/@jesuscopado-en" rel="noopener noreferrer"&gt;Jesus Copado&lt;/a&gt;’s brilliant &lt;a href="https://theneuralmaze.substack.com/t/ava-the-whatsapp-agent" rel="noopener noreferrer"&gt;Ava the WhatsApp Agent series&lt;/a&gt; and tried building something similar. They built a multimodal WhatsApp bot using LangGraph and Google Cloud Run. The agent could hold conversations, analyze images, generate art, and process voice messages.&lt;/p&gt;

&lt;p&gt;After going through the series, I had one question: what would this look like built 100% on AWS?&lt;/p&gt;

&lt;p&gt;I started sketching out the architecture and quickly realized there were too many ways to build it. Pure Lambda orchestration? Bedrock Agents? Bedrock AgentCore? LangChain on Lambda? Step Functions? Each approach had tradeoffs I couldn’t ignore.&lt;/p&gt;

&lt;p&gt;That’s when I decided to build a hybrid system. Not because hybrid is always better, but because building both patterns side by side would force me to understand when each approach makes sense.&lt;/p&gt;

&lt;p&gt;The result is a production-ready WhatsApp bot on a manageable budget that demonstrates two distinct architectural patterns in the same codebase. You can find the complete code and deployment scripts at github.com/marceloacosta/multimodal-whatsapp-bot-aws to try it yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j5ow96n9mzmi9vaoz6h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j5ow96n9mzmi9vaoz6h.png" alt="Whatsapp screenshot" width="800" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What You’ll Build
&lt;/h2&gt;

&lt;p&gt;By the end of this guide, you’ll understand how to build a WhatsApp bot with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Natural conversations powered by Claude 3.5 Sonnet&lt;/li&gt;
&lt;li&gt;Image analysis using Claude Vision&lt;/li&gt;
&lt;li&gt;AI image generation with Stable Diffusion XL (or Amazon Titan)&lt;/li&gt;
&lt;li&gt;Voice message transcription with AWS Transcribe&lt;/li&gt;
&lt;li&gt;Text-to-speech responses using Amazon Polly&lt;/li&gt;
&lt;li&gt;A serverless architecture that scales automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More importantly, you’ll understand when to use direct Lambda processing versus Bedrock Agent frameworks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hybrid Architecture?
&lt;/h2&gt;

&lt;p&gt;Most tutorials show you one approach and call it done. I’m showing you both because the “best” architecture depends on what you’re building.&lt;/p&gt;

&lt;p&gt;Here’s the reality: simple operations don’t need the complexity of agent frameworks. Complex operations benefit from them. I learned this the hard way after rebuilding parts of this system three times.&lt;/p&gt;

&lt;p&gt;The project uses direct Lambda functions for straightforward tasks like image analysis, text-to-speech, and transcription. These are deterministic operations that don’t need natural language understanding or multi-turn conversations.&lt;/p&gt;

&lt;p&gt;For image generation, I use Bedrock Agents. Why? Because turning “create a sunset over mountains” into an optimized prompt for an image model requires natural language understanding and prompt engineering. An agent handles this better than hardcoded logic.&lt;/p&gt;

&lt;p&gt;This approach saves money where agents would be overkill, and uses them where they add real value.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Reality Check
&lt;/h2&gt;

&lt;p&gt;Before we dive deeper, here’s what running this bot actually costs:&lt;/p&gt;

&lt;p&gt;For 1,000 messages per day:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda execution: $5-10&lt;/li&gt;
&lt;li&gt;Bedrock models: $20-30&lt;/li&gt;
&lt;li&gt;S3 storage: $1-2&lt;/li&gt;
&lt;li&gt;API Gateway: $1&lt;/li&gt;
&lt;li&gt;Other services: $3-5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total: $30-50 per month.&lt;/p&gt;

&lt;p&gt;Image generation adds extra cost per image. Titan costs $0.01 per image, Stable Diffusion XL costs $0.04. These costs scale with usage, but you have full control over which model you use.&lt;/p&gt;

&lt;p&gt;Paying only for what you use across AWS services often beats being locked into third-party platforms with mandatory monthly fees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The system consists of 8 Lambda functions working together:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp806ggt14bbzhwf94h2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp806ggt14bbzhwf94h2c.png" alt="Architecture overview" width="800" height="926"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Entry and orchestration:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;inbound-webhook: Receives WhatsApp messages via API Gateway&lt;/li&gt;
&lt;li&gt;wa-process: Main orchestrator that routes requests&lt;/li&gt;
&lt;li&gt;wa-send: Sends messages back to WhatsApp&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Feature handlers:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;wa-image-analyze: Analyzes images using Claude Vision&lt;/li&gt;
&lt;li&gt;wa-image-generate: Generates images using Titan or Stable Diffusion&lt;/li&gt;
&lt;li&gt;wa-tts: Converts text to speech with Amazon Polly&lt;/li&gt;
&lt;li&gt;wa-audio-transcribe: Starts transcription jobs using AWS Transcribe&lt;/li&gt;
&lt;li&gt;wa-transcribe-finish: Handles transcription callbacks&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Supporting services:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;AWS Bedrock: Supervisor Agent + ImageCreator Sub-Agent&lt;/li&gt;
&lt;li&gt;Amazon Polly: Text-to-speech synthesis&lt;/li&gt;
&lt;li&gt;AWS Transcribe: Audio transcription&lt;/li&gt;
&lt;li&gt;S3 buckets: Media storage and generated images&lt;/li&gt;
&lt;li&gt;Secrets Manager: WhatsApp API credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture diagram shows the complete flow, but I’ll walk you through how each piece works and why I made specific decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Framework: Lambda vs Agents
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fuf8zyexiapr03if783.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fuf8zyexiapr03if783.png" alt="Lambda vs Agents" width="729" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s how I decided which approach to use for each feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use direct Lambda when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The operation is deterministic (TTS always works the same way)&lt;/li&gt;
&lt;li&gt;You’re calling an AWS service directly (Transcribe, Polly)&lt;/li&gt;
&lt;li&gt;The input-output relationship is simple&lt;/li&gt;
&lt;li&gt;You want lower latency and cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Use Bedrock Agents when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You need natural language understanding&lt;/li&gt;
&lt;li&gt;The task requires reasoning or optimization&lt;/li&gt;
&lt;li&gt;Multi-turn conversations matter&lt;/li&gt;
&lt;li&gt;Context needs to persist across interactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Image analysis went to Lambda. The operation is simple: take an image, send it to Claude Vision, return the description. No complex prompt engineering needed.&lt;/p&gt;

&lt;p&gt;Image generation went to Agents. User requests like “sunset” need to become detailed prompts like “a photorealistic sunset over mountain peaks with golden hour lighting, highly detailed, 8k resolution.” The agent handles this transformation.&lt;/p&gt;

&lt;p&gt;The goal isn’t to pick a winner, but to match each method to what it does best.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Foundation
&lt;/h2&gt;

&lt;p&gt;Let’s start with the basics. You’ll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS account with Bedrock access&lt;/li&gt;
&lt;li&gt;Python 3.9 or higher&lt;/li&gt;
&lt;li&gt;AWS CLI configured&lt;/li&gt;
&lt;li&gt;WhatsApp Business API account from Meta for Developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You also need to enable model access in Bedrock for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude 3.5 Sonnet v2&lt;/li&gt;
&lt;li&gt;Claude 3.5 Haiku&lt;/li&gt;
&lt;li&gt;Titan Image Generator v2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Model access is free to enable. You only pay when you use them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up WhatsApp Business API
&lt;/h2&gt;

&lt;p&gt;Getting WhatsApp access is straightforward but takes a few steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to Meta for Developers and create an app&lt;/li&gt;
&lt;li&gt;Add the WhatsApp product to your app&lt;/li&gt;
&lt;li&gt;Get your Phone Number ID and Access Token&lt;/li&gt;
&lt;li&gt;Generate a verify token (any random string you choose)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdg37cbjge5xihk6wk8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdg37cbjge5xihk6wk8g.png" alt="Setting Up WhatsApp Business API" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uitr12n2mzvr0l4mi27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6uitr12n2mzvr0l4mi27.png" alt="Setting Up WhatsApp Business API" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Store the long-lived access token in AWS Secrets Manager. This is important because this token needs rotation over time.&lt;/p&gt;

&lt;p&gt;Create a secret with this structure:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;{&lt;br&gt;
“token”: “your_long_lived_access_token”&lt;br&gt;
}&lt;/code&gt;&lt;br&gt;
The Phone Number ID and Verify Token go in Lambda environment variables. Only the access token needs to be in Secrets Manager because it’s the credential that requires rotation and is security-sensitive.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Configuration Strategy
&lt;/h2&gt;

&lt;p&gt;Lambda functions don’t use .env files. Each function has its own environment variables set directly in AWS Console or via CLI.&lt;/p&gt;

&lt;p&gt;The env.example file in the repo is just a reference document showing what variables exist and where they’re used. Different Lambda functions need different configurations. The orchestrator needs agent IDs. The image generator needs model IDs and bucket names. The sender only needs to know where to find the access token in Secrets Manager.&lt;/p&gt;

&lt;p&gt;This keeps each function’s configuration minimal and explicit&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wndocqw4z818ith6rgf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wndocqw4z818ith6rgf.png" alt="Environmental variables" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Building the Entry Point
&lt;/h2&gt;

&lt;p&gt;Every WhatsApp message hits inbound-webhook first. This Lambda handles two responsibilities: webhook verification and receiving messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqlqxv9hel5gsnmtzi29b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqlqxv9hel5gsnmtzi29b.png" alt="Entrey point" width="800" height="878"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The verification flow is straightforward. WhatsApp sends a GET request with a challenge token. The Lambda verifies the token matches what you configured, then returns the challenge back. This proves you control the endpoint.&lt;/p&gt;

&lt;p&gt;After verification passes, WhatsApp starts sending POST requests with message data. When media arrives (images, audio), the webhook downloads it to S3 for processing. Then it invokes wa-process asynchronously.&lt;/p&gt;

&lt;p&gt;The async pattern is critical. WhatsApp expects a 200 response within seconds. Your bot might take 10-20 seconds to generate a response. Async invocation lets you acknowledge receipt immediately while processing happens in the background.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building the Orchestrator
&lt;/h2&gt;

&lt;p&gt;The wa-process Lambda is the brain of the system. It receives a message and decides what to do with it.&lt;/p&gt;

&lt;p&gt;The logic follows a simple flow: identify message type (text, image, audio), check for special intents like voice responses, route to the appropriate handler, and send the response back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ccn6kfz5lwlucvgek4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ccn6kfz5lwlucvgek4l.png" alt="Orchestrator" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For text messages, the function invokes the Bedrock Supervisor Agent and sends the response directly. For images with questions, it prepares context that includes the S3 URI and user’s question, then invokes the agent. For audio, it triggers the transcription Lambda and waits for the callback.&lt;/p&gt;

&lt;p&gt;The hybrid architecture shows its value here. The orchestrator doesn’t care whether a feature uses direct Lambda calls or agent frameworks. Text and image analysis go through the agent. Audio transcription calls a Lambda directly. Image generation gets delegated to a sub-agent. The orchestrator just routes requests to the right place.&lt;/p&gt;

&lt;p&gt;The orchestrator also handles voice response requests. When a user asks for a voice message, it sets a flag and invokes the agent to generate text. Once the agent responds, it calls wa-tts to convert that text to audio. This separation keeps the agent focused on content generation while the orchestrator manages output formats.&lt;/p&gt;
&lt;h2&gt;
  
  
  Direct Lambda Pattern: Image Analysis
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznmxo5qvphuuqa7e5ilq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fznmxo5qvphuuqa7e5ilq.png" alt="Image analysis" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image analysis shows the direct Lambda approach clearly. The operation is simple: download an image from S3, send it to Claude Vision via the Bedrock Converse API, and return the description.&lt;/p&gt;

&lt;p&gt;The Lambda downloads the image bytes from S3 rather than passing an S3 reference. This makes the code more resilient to API changes. The image bytes and the user’s question get sent to Claude 3.5 Sonnet Vision, which returns a description.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwd7xe9guulr8cxwzs51.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwd7xe9guulr8cxwzs51.png" alt="Image analysis with lambda" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This direct approach gives you full control. No agent orchestration, no prompt optimization, just a straightforward API call. The entire Lambda executes in under 3 seconds.&lt;/p&gt;

&lt;p&gt;The cost is predictable: $0.008 per image analyzed. At 1,000 images per month, that’s $8. The agent framework would add orchestration overhead without adding value for this use case.&lt;/p&gt;

&lt;p&gt;When would you add an agent layer? When the image analysis needs to trigger other actions, maintain conversation context across multiple images, or integrate with knowledge bases. For straightforward “analyze this image” requests, direct Lambda is the better choice.&lt;/p&gt;
&lt;h2&gt;
  
  
  Direct Lambda Pattern: Voice and Audio
&lt;/h2&gt;

&lt;p&gt;Text-to-speech and audio transcription follow the same direct Lambda pattern.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56ogvr2yfazwrcij5hhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56ogvr2yfazwrcij5hhy.png" alt="Voice and audio" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For TTS, the wa-tts Lambda receives text from the orchestrator and calls Amazon Polly to synthesize speech. Polly returns an MP3 audio stream, which gets uploaded to S3. The Lambda generates a presigned URL for the audio file and returns it to the orchestrator. The orchestrator then calls wa-send with that audio URL to deliver it to WhatsApp. The entire operation costs about $0.016 per request (Polly charges $16 per 1 million characters).&lt;/p&gt;

&lt;p&gt;Audio transcription is more complex because AWS Transcribe is asynchronous. You can’t just call an API and get the result immediately.&lt;/p&gt;

&lt;p&gt;The wa-audio-transcribe Lambda starts a transcription job. It tells Transcribe where to find the audio file in S3 (uploaded earlier by the webhook), what format it’s in (usually OGG for WhatsApp voice notes), and where to store the results. Then it returns immediately.&lt;/p&gt;

&lt;p&gt;AWS Transcribe processes the audio in the background. When finished, it writes the transcript JSON to S3. This triggers an S3 ObjectCreated event that invokes the wa-transcribe-finish Lambda. This Lambda reads the transcript from S3, extracts the text, and sends it back to the orchestrator as if it were a new text message. The orchestrator then sends it to the agent for processing.&lt;/p&gt;

&lt;p&gt;This async pattern is crucial for long-running operations. WhatsApp users expect quick responses, but transcription can take 30-60 seconds depending on audio length. The callback pattern lets the user know their message was received while processing happens in the background.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent Framework Pattern: Conversations
&lt;/h2&gt;

&lt;p&gt;Now let’s look at the agent side. The Supervisor Agent handles all text conversations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmgqrccmjfhtlyrc2wua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmgqrccmjfhtlyrc2wua.png" alt="Agent Framework" width="800" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The agent instructions require quite a bit of thought. You need to balance several competing concerns: natural conversation flow, WhatsApp’s messaging constraints, multi-language support, and managing different output formats.&lt;/p&gt;

&lt;p&gt;The instructions need to handle language detection and matching. Users might write in Spanish, English, or Portuguese. The agent needs to detect this and respond appropriately. This is straightforward for text but becomes tricky when you add voice responses.&lt;/p&gt;

&lt;p&gt;For voice responses, there’s a subtle problem. If a user asks for an audio message and the agent says “I’ll send you an audio message about quantum physics,” the TTS system converts that entire sentence to audio. The user hears “I’ll send you an audio message about quantum physics” instead of just hearing about quantum physics. The solution is explicit instructions: never mention the output format, just generate the content. The backend handles format conversion.&lt;/p&gt;

&lt;p&gt;The instructions also need to consider WhatsApp’s messaging patterns. Long paragraphs work poorly in chat. The agent needs to keep responses concise while still being helpful. This means being explicit about brevity without sacrificing accuracy.&lt;/p&gt;

&lt;p&gt;Benefits of this approach: the agent focuses on content generation, not infrastructure concerns. You can add new output formats (video captions, PDFs) without changing agent instructions. The separation between content and delivery is clean.&lt;/p&gt;

&lt;p&gt;Drawbacks: the instructions become longer and more specific. More specific instructions mean less flexibility for the agent to adapt to edge cases. You also need to test thoroughly because the agent won’t tell you when it’s confused about format handling.&lt;/p&gt;

&lt;p&gt;The agent connects to Lambda functions via action groups. For image analysis, the action group defines a function with parameters for the S3 URI, optional question, and optional language code. When a user sends an image with a question, the orchestrator formats it as a structured context block with these parameters. The agent parses this, calls the analyzeImage action, and returns the result.&lt;/p&gt;

&lt;p&gt;This separation is powerful. You can change how image analysis works (switch models, add caching, implement fallbacks) without touching the orchestrator or agent instructions. The interface stays stable while the implementation evolves.&lt;/p&gt;
&lt;h2&gt;
  
  
  Agent Framework Pattern: Image Generation
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg15qx2a4l5xh2g1lf4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdg15qx2a4l5xh2g1lf4u.png" alt="Agent Image Generation" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image generation shows why agents matter for complex tasks. When a user says “create a sunset,” that vague request needs to become a detailed prompt like “a photorealistic sunset over mountain peaks with golden hour lighting, vibrant orange and purple clouds, highly detailed, 8k resolution.” This transformation requires natural language understanding and prompt engineering, which agents handle well.&lt;/p&gt;

&lt;p&gt;The architecture uses a sub-agent pattern. The Supervisor Agent detects image generation requests and delegates to an ImageCreator sub-agent. This keeps responsibility focused: the supervisor handles routing decisions, the sub-agent handles prompt optimization, and the Lambda handles the actual image generation.&lt;/p&gt;

&lt;p&gt;The ImageCreator sub-agent analyzes the user’s natural language request and creates an optimized prompt for the image model. It considers style preferences, adds quality modifiers, and constructs negative prompts to avoid common issues. Then it calls the wa-image-generate Lambda through an action group.&lt;/p&gt;

&lt;p&gt;The Lambda receives the optimized prompt and calls the configured Bedrock image model (Stable Diffusion XL or Titan). The generated image gets uploaded to S3, a presigned URL is created, and the Lambda uses Claude Haiku to generate a natural caption in the user’s language. Finally, it invokes wa-send to deliver the image to WhatsApp with the caption.&lt;/p&gt;

&lt;p&gt;The sub-agent responds with a simple success indicator to the supervisor, which passes it back to the orchestrator. The orchestrator knows the image was already sent directly by the Lambda, so it doesn’t send anything else.&lt;/p&gt;

&lt;p&gt;This multi-layer delegation (orchestrator → supervisor → sub-agent → Lambda) seems complex, but each layer has a clear purpose. The orchestrator routes by message type. The supervisor manages conversation context. The sub-agent optimizes prompts. The Lambda generates images. Each component does one thing well.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Configuration Pattern
&lt;/h2&gt;

&lt;p&gt;Earlier I mentioned environment variables are set per-Lambda. Here’s the complete pattern:&lt;/p&gt;
&lt;h3&gt;
  
  
  Secrets Manager (long-lived token only):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp access token (needs rotation, security-sensitive)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Lambda environment variables (function-specific):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;wa-process: Agent IDs, region, function names&lt;/li&gt;
&lt;li&gt;wa-image-generate: Model IDs, bucket names&lt;/li&gt;
&lt;li&gt;inbound-webhook: Bucket names, verify token, downstream functions&lt;/li&gt;
&lt;li&gt;wa-send: Phone number ID, secret name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach scales better than shared configuration. Each function only knows what it needs. Changes to one function don’t affect others.&lt;/p&gt;

&lt;p&gt;Setting these via CLI looks like:&lt;/p&gt;

&lt;p&gt;`aws lambda update-function-configuration \&lt;/p&gt;

&lt;p&gt;--function-name wa-process \&lt;/p&gt;

&lt;p&gt;--environment Variables=’{&lt;/p&gt;

&lt;p&gt;“BEDROCK_AGENT_ID”:”AGENTXXX”,&lt;/p&gt;

&lt;p&gt;“BEDROCK_AGENT_ALIAS_ID”:”ALIASXXX”,&lt;/p&gt;

&lt;p&gt;“BEDROCK_REGION”:”us-east-1”,&lt;/p&gt;

&lt;p&gt;“MEDIA_BUCKET”:”my-media-bucket”&lt;/p&gt;

&lt;p&gt;}’`&lt;br&gt;
Or use the AWS Console for easier management. Both approaches work.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deployment Strategy
&lt;/h2&gt;

&lt;p&gt;The repo includes automated deployment scripts that handle the entire setup. Understanding what happens during deployment helps when debugging issues later.&lt;/p&gt;

&lt;p&gt;Lambda deployment involves several steps: packaging the code, creating the function with the right runtime and memory settings, configuring environment variables, and setting up triggers. Each function needs different timeout and memory configurations. The webhook and orchestrator need quick response times. Image generation needs more time and memory. Audio transcription is somewhere in between.&lt;/p&gt;

&lt;p&gt;The deployment scripts handle creating IAM roles with appropriate permissions. Each Lambda gets least-privilege access: only the specific AWS services it needs. The image analyzer reads from S3 but doesn’t write. The image generator writes to S3 but doesn’t read user data. The orchestrator invokes other Lambdas but doesn’t access S3 directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwokiw1f28njyc85x1o2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwokiw1f28njyc85x1o2z.png" alt="Deployment" width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Triggers need configuration too. API Gateway triggers the webhook Lambda on HTTP requests. S3 ObjectCreated events trigger the transcription finish Lambda. Other Lambdas get invoked directly by other functions, so they don’t need external triggers.&lt;/p&gt;

&lt;p&gt;The critical piece many people miss: Bedrock Agents need explicit permission to invoke Lambda functions. AWS doesn’t automatically grant this. You must add a resource-based policy to each Lambda that allows the bedrock.amazonaws.com service principal to invoke it, scoped to your specific agent ARN. Without this permission, the agent fails silently with generic error messages like “I cannot help with that.”&lt;/p&gt;

&lt;p&gt;The automated scripts handle all these details, but knowing what they do helps when something goes wrong. If an agent can’t invoke a Lambda, check the resource policy. If a Lambda times out, check the timeout setting. If environment variables are missing, check the function configuration.&lt;/p&gt;
&lt;h2&gt;
  
  
  Setting Up Bedrock Agents
&lt;/h2&gt;

&lt;p&gt;Creating agents through the AWS Console is straightforward but has specific steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkfdrd38otjf6gzatvtb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkfdrd38otjf6gzatvtb.png" alt="Setting up Bedrock Agents" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  For the Supervisor Agent:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;- Go to Bedrock Console → Agents → Create Agent&lt;/li&gt;
&lt;li&gt;- Name it descriptively (I use whatsapp-supervisor-agent)&lt;/li&gt;
&lt;li&gt;- Choose Claude 3.5 Sonnet v2 as the foundation model&lt;/li&gt;
&lt;li&gt;- Copy instructions from supervisor-agent-instructions.txt&lt;/li&gt;
&lt;li&gt;- Add action group for image analysis&lt;/li&gt;
&lt;li&gt;- Prepare the agent (this compiles everything)&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;Create an alias pointing to the prepared version
That last step trips people up. Changes to an agent don’t take effect until you:&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;Prepare the agent (creates a new version)&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;ul&gt;
&lt;li&gt;Update the alias to point to the new version&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you change instructions and skip these steps, your bot still uses the old version.&lt;/p&gt;
&lt;h3&gt;
  
  
  For the ImageCreator sub-agent:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;- Create another agent with a focused name&lt;/li&gt;
&lt;li&gt;- Use simpler instructions (it has one job)&lt;/li&gt;
&lt;li&gt;- Add action group with the OpenAPI schema from lambdas/wa-image-generate/openapi-schema.json&lt;/li&gt;
&lt;li&gt;- Prepare and create alias&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then link them:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;- Edit the Supervisor Agent&lt;/li&gt;
&lt;li&gt;- Add ImageCreator as a collaborator&lt;/li&gt;
&lt;li&gt;- Specify when to delegate (image generation requests)&lt;/li&gt;
&lt;li&gt;- Prepare the supervisor again&lt;/li&gt;
&lt;li&gt;- Update its alias&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The supervisor now knows to call the sub-agent for image requests.&lt;/p&gt;
&lt;h2&gt;
  
  
  Image Generation Models
&lt;/h2&gt;

&lt;p&gt;The system supports two image generation models through a single Lambda function. You choose which model to use by setting the IMAGE_MODEL_ID environment variable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;VISION_MODEL_ID = os.environ.get(”VISION_MODEL_ID”, “us.anthropic.claude-3-5-sonnet-20241022-v2:0”)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stable Diffusion XL is the default. It offers more creative control with style presets and costs about $0.04 per image. Amazon Titan Image Generator v1 is the alternative, optimized for photorealistic output at about $0.01 per image.&lt;/p&gt;

&lt;p&gt;The Lambda detects which model is configured and uses the appropriate API format. Each model has different input parameters and response structures, but the Lambda abstracts these differences. From the agent’s perspective, image generation works the same way regardless of which model you choose.&lt;/p&gt;

&lt;p&gt;To switch models, you update the Lambda’s environment variable in AWS Console or via CLI. The benefit of this design is that only the one Lambda changes. The orchestrator, agents, and other Lambdas continue working without modification. The abstraction layer handles the model-specific differences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance Optimization
&lt;/h2&gt;

&lt;p&gt;Lambda cold starts matter for user experience. When a function hasn’t run recently, AWS needs to initialize it. This adds 1-3 seconds of latency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j035t6sfdz4sh8wj43j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3j035t6sfdz4sh8wj43j.png" alt="Lambda cold start" width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This demo doesn’t use provisioned concurrency to keep costs minimal. For production deployments with consistent traffic, consider provisioned concurrency for the webhook and orchestrator functions. These are in the critical path for response time. Other functions can tolerate cold starts because they’re not user-facing or run asynchronously.&lt;/p&gt;

&lt;p&gt;Agent response time varies based on complexity. Simple text responses take 2-4 seconds. Image generation requests take 10-15 seconds total (agent reasoning + generation + upload).&lt;/p&gt;

&lt;p&gt;For audio transcription, the system can send an immediate acknowledgment, then delivers the actual transcription when ready. This manages user expectations for the longer processing time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Considerations
&lt;/h2&gt;

&lt;p&gt;The system has several security layers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvm1dq868yg047lhgwaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvm1dq868yg047lhgwaf.png" alt="Security Layers" width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Webhook verification ensures only WhatsApp can send messages. Without the correct verify token, requests are rejected.&lt;/p&gt;

&lt;p&gt;IAM roles follow least privilege. Each Lambda only has permissions for the specific AWS services it needs. The image analyzer can read from S3 but not write. The image generator can write but not read others’ images.&lt;/p&gt;

&lt;p&gt;Secrets Manager handles credential rotation. The WhatsApp access token can be rotated without code changes. Lambda functions fetch the current token at runtime.&lt;/p&gt;

&lt;p&gt;S3 buckets are private by default. Images are shared via presigned URLs that expire after 7 days. No public bucket access.&lt;/p&gt;

&lt;p&gt;What’s missing? Content moderation. The current implementation doesn’t filter generated images or user prompts. For production use, add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bedrock Guardrails to filter inappropriate prompts&lt;/li&gt;
&lt;li&gt;Image scanning before sending to users&lt;/li&gt;
&lt;li&gt;Rate limiting per user&lt;/li&gt;
&lt;li&gt;Cost monitoring and alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These additions depend on your specific requirements and risk tolerance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;I rebuilt parts of this system three times. Here’s what I learned:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofonk3pttnnjrp5dz1dz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fofonk3pttnnjrp5dz1dz.png" alt="Lessons Learned" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agent instructions require precision. Vague instructions lead to unpredictable behavior. The voice response handling needed explicit rules about never mentioning the output format. Language detection needed clear fallback behavior. Each edge case required specific handling in the instructions.&lt;/p&gt;

&lt;p&gt;Hybrid architecture balances trade-offs. Pure agent systems cost more and respond slower for simple operations. Pure Lambda systems require writing all the conversational logic yourself. The hybrid approach uses agents where their natural language capabilities add value and direct Lambdas where they don’t.&lt;/p&gt;

&lt;p&gt;Async patterns matter for user experience. WhatsApp users expect quick acknowledgments. Transcription takes 30-60 seconds. Image generation takes 10-15 seconds. The async callback patterns let the system respond immediately while work happens in the background.&lt;/p&gt;

&lt;p&gt;Component isolation simplifies debugging. Each Lambda has a single responsibility. When something breaks, you can test that Lambda independently. Clear interfaces between components mean changes don’t cascade unexpectedly.&lt;/p&gt;

&lt;p&gt;Permission issues cause silent failures. Bedrock Agents fail with generic error messages when they can’t invoke Lambdas. IAM permission debugging takes time. Checking permissions early when something doesn’t work saves troubleshooting time later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative Approaches
&lt;/h2&gt;

&lt;p&gt;This hybrid architecture is one way to build this system. Here are alternatives and when to use them.&lt;/p&gt;

&lt;p&gt;Pure Lambda orchestration: Remove Bedrock Agents entirely. The orchestrator directly calls all functions based on deterministic logic. Simpler and cheaper, but you write all the prompt engineering logic yourself.&lt;/p&gt;

&lt;p&gt;Pure Agent architecture: Make everything an agent action group. Image analysis, TTS, transcription all go through the agent. Unified conversational interface with better context management, but higher cost and latency for simple tasks.&lt;/p&gt;

&lt;p&gt;Bedrock AgentCore: Use AWS Bedrock AgentCore with your choice of agent framework (LangGraph, CrewAI, LlamaIndex). More infrastructure services like 8-hour runtimes and built-in observability, but requires more architectural decisions upfront.&lt;/p&gt;

&lt;p&gt;Agent framework (LangChain, CrewAI): Replace Bedrock Agents with an open-source framework hosted in Lambda. Full control and portability, but you handle state management and dependencies yourself.&lt;/p&gt;

&lt;p&gt;Step Functions orchestration: Use AWS Step Functions for workflow management instead of Lambda orchestration. Visual workflows with built-in retry logic, but more services to manage.&lt;/p&gt;

&lt;p&gt;The right choice depends on your requirements. The hybrid approach teaches you both patterns so you can decide what works for your use case.&lt;/p&gt;

&lt;p&gt;For a detailed comparison with pros, cons, and migration paths, see the &lt;a href="https://github.com/marceloacosta/multimodal-whatsapp-bot-aws/blob/main/ARCHITECTURE_DECISIONS.md" rel="noopener noreferrer"&gt;ARCHITECTURE_DECISIONS.md&lt;/a&gt; document in the repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;The repo includes automated deployment scripts that handle the Lambda setup. You can deploy everything at once or go function by function to understand each piece. After the Lambda deployment, you’ll create the Bedrock agents through the AWS Console and link them together.&lt;/p&gt;

&lt;p&gt;The documentation walks you through both approaches. If you want to understand every component, deploy and test each Lambda individually. If you want to get running quickly, use the automated scripts and dive into specific parts later.&lt;/p&gt;

&lt;p&gt;Setting up the agents requires more manual steps. You’ll create the supervisor agent with its conversation instructions, add the action group for image analysis, then create the image creator sub-agent and link it as a collaborator. The agent setup guide includes the exact instructions and parameters for each step.&lt;/p&gt;

&lt;p&gt;The code is designed to be adaptable. The hybrid architecture isn’t prescriptive. Want to remove agents and handle everything with Lambda logic? The orchestrator is easy to modify. Want to add new capabilities? Create a Lambda, add it to the orchestrator’s routing logic, and decide whether to call it directly or through an agent action group.&lt;/p&gt;

&lt;p&gt;The repo documentation covers deployment details, agent configuration, troubleshooting, and architectural alternatives. Start with what interests you most.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Enables
&lt;/h2&gt;

&lt;p&gt;This isn’t just about building a WhatsApp bot. The patterns here apply to many AI applications.&lt;/p&gt;

&lt;p&gt;The hybrid architecture shows how to balance simplicity with capability. The agent collaboration pattern shows how to break complex tasks into focused components. The async processing pattern shows how to maintain good user experience with slow operations.&lt;/p&gt;

&lt;p&gt;You can adapt these patterns to build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telegram or Discord bots with the same backend&lt;/li&gt;
&lt;li&gt;Slack integrations with multimodal capabilities&lt;/li&gt;
&lt;li&gt;API services that use agents for complex requests&lt;/li&gt;
&lt;li&gt;Customer service automation with image support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The serverless foundation means it scales automatically. The AWS services handle infrastructure so you focus on functionality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Go From Here
&lt;/h2&gt;

&lt;p&gt;If you build something with this architecture, I’d like to hear about it. What worked? What didn’t? What did you change?&lt;/p&gt;

&lt;p&gt;The complete code, documentation, and deployment scripts are at &lt;a href="https://github.com/marceloacosta/multimodal-whatsapp-bot-aws" rel="noopener noreferrer"&gt;github.com/marceloacosta/multimodal-whatsapp-bot-aws&lt;/a&gt;. The repo is actively maintained. Issues and pull requests are welcome.&lt;/p&gt;

&lt;p&gt;Start with the README for an overview, then dive into the architecture decisions document to understand the tradeoffs. The code includes comments explaining why specific approaches were chosen.&lt;/p&gt;

&lt;p&gt;For questions or discussion, you can find me &lt;a href="https://substack.com/@marckush" rel="noopener noreferrer"&gt;here&lt;/a&gt; or on &lt;a href="https://www.linkedin.com/in/marceloacostacavalero/" rel="noopener noreferrer"&gt;Linkedin&lt;/a&gt;. I regularly share updates about AI systems and AWS architecture patterns.&lt;/p&gt;

&lt;p&gt;Build something interesting with this. Then share what you learned.&lt;/p&gt;

&lt;p&gt;I publish every week on buildwithaws.substack.com. If this was useful, subscribe. It's free.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
