DEV Community: Marcelo Acosta Cavalero

Steal These 25 Compliance and Security Agent Patterns for AWS

Marcelo Acosta Cavalero — Mon, 25 May 2026 12:28:32 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

The security team got paged at 2am.

An IAM role in the production account had been granted S3 full access.

The CloudTrail event showed when it happened and who did it.

What it did not show was whether this change violated their least-privilege policy, which compliance framework required remediation within 24 hours, whether other roles in the same account had similar drift, and what the correct permission set should have been.

The analyst spent three hours cross-referencing IAM policies against their access matrix spreadsheet, checking the SOC 2 control mapping in Confluence, and writing the finding into GRC tool.

The violation was straightforward.

The investigation and documentation ritual was not.

This is the fifth and final edition of a five-part series cataloging real AI architecture patterns running on AWS.

Edition 1 covered customer-facing agents.

Edition 2 covered internal knowledge and productivity agents.

Edition 3 covered workflow automation and process agents.

Edition 4 covered data and analytics agents.

This edition addresses the domain where agents face the highest stakes and tightest constraints: compliance, security, and governance.

Agents that audit configurations, detect violations, investigate incidents, manage regulatory obligations, and maintain the control posture that keeps organizations out of trouble.

If you missed the earlier editions, go back to Edition 1 for the “Agent or Not?” scoring framework and the AgentCore vs Quick breakdown.

Edition 3 introduced the hybrid Step Functions + AgentCore pattern for workflow automation.

Edition 4 introduced the answer-versus-query decision for analytics agents.

Both concepts apply here, with additional constraints unique to high-stakes environments.

The Trust Boundary for Compliance Agents

Compliance and security agents operate under a fundamentally different trust model than every other category in this series.

A customer support agent that makes a mistake generates a bad experience.

A compliance agent that makes a mistake generates a regulatory finding, a failed audit, or a missed breach notification deadline.

The blast radius is organizational, not transactional.

Three principles govern how these agents should operate.

Recommend, do not remediate by default.

Most compliance and security agents should generate findings, recommendations, and draft responses rather than taking autonomous action.

The human-in-the-loop is not a limitation of the technology - it is a control requirement.

Auditors want to see that a qualified person reviewed the finding before action was taken.

Agents that auto-remediate should be restricted to pre-approved, bounded actions (blocking a known-malicious IP, revoking a compromised credential) with full audit trails and rollback capability.

Evidence preservation is non-negotiable.

Every agent decision must produce an audit trail that shows what data the agent examined, what reasoning it applied, and what recommendation or action it produced.

This is not logging for debugging.

This is evidence for auditors.

The trail needs to be immutable (S3 with object lock, or equivalent), timestamped, and retained according to your regulatory requirements.

AgentCore Observability provides agent execution traces.

Supplement with explicit evidence capture at each decision point.

Policy as code, not policy as prompt.

Agent behavior boundaries should be defined in AgentCore Policy (which uses natural-language authoring to generate candidate Cedar authorization policies, validates them, and checks safety conditions) or equivalent deterministic enforcement, not in system prompts that the model might drift from.

Policy enforcement applies primarily to tool interactions routed through AgentCore Gateway - actions outside the Gateway path still need explicit IAM constraints, application-level logic, or workflow controls.

The prompt tells the agent what to analyze and how to reason.

Policy tells the agent what it is allowed to do through Gateway-mediated tools.

These are different concerns with different reliability requirements.

Unless otherwise noted, all references to AgentCore Policy in this edition assume enforcement through actions exposed via AgentCore Gateway.

The Continuous Compliance Model

Traditional compliance operates on a cycle: prepare for the audit, scramble to collect evidence, pass the audit, relax until next year.

This edition’s patterns enable continuous compliance - agents that monitor controls in real-time, detect drift immediately, generate evidence continuously, and maintain audit readiness as a steady state rather than a periodic event.

The shift matters because regulatory environments are moving toward continuous assurance.

SOC 2 Type II already evaluates controls over a period, not at a point in time.

PCI DSS 4.0 introduces continuous monitoring requirements.

DORA (Digital Operational Resilience Act) requires ongoing ICT risk assessment.

Agents that maintain continuous compliance posture position organizations ahead of where regulatory expectations are heading.

Reference Architectures for Compliance and Security Agents

Compliance and security agents integrate with security tooling, GRC platforms, policy engines, and audit systems rather than the CRM and data warehouse APIs from earlier editions.

Evidence preservation, policy enforcement, and audit trail requirements add architectural components that did not feature in previous editions.

Reference Architecture N - Compliance Monitoring Agent

Platform:

AgentCore

When to use:

The agent continuously monitors compliance controls, detects violations, investigates context, generates findings with evidence, and routes to the appropriate remediation workflow.

Triggers from AWS Config rule evaluations, Security Hub findings, GuardDuty alerts, CloudTrail events, or scheduled scans.

AgentCore Policy constrains the agent to read-only operations and pre-approved bounded remediations.

Evidence flows to an immutable S3 bucket with object lock for audit retention.

Covers compliance monitoring, control drift detection, finding investigation, evidence collection, and regulatory reporting automation.

Reference Architecture O - Security Investigation Agent

Platform:

AgentCore

When to use:

The agent investigates security alerts by correlating data across multiple sources, building a timeline of events, assessing blast radius, and producing an investigation report.

AgentCore Memory maintains investigation context across multi-step analysis.

The agent reads broadly but acts narrowly - investigation is read-heavy, and any containment actions require explicit policy authorization.

Covers alert triage, incident investigation, threat hunting, and forensic analysis where the agent augments the analyst rather than replacing them.

Reference Architecture P - Policy and Document Analysis Agent

Platform:

AgentCore

When to use:

The agent analyzes documents (policies, contracts, regulatory texts, audit reports) against compliance requirements, control frameworks, or organizational standards.

Uses Bedrock’s long-context capabilities for document analysis and Bedrock Knowledge Bases for retrieval against regulatory corpora.

Produces structured outputs (gap analyses, control mappings, risk assessments) rather than taking operational actions.

Covers regulatory change analysis, policy gap assessment, contract compliance review, and audit evidence evaluation.

Reference Architecture Q - Multi-Agent Governance Coordinator

Platform:

AgentCore (multi-agent)

When to use:

Complex governance scenarios that span multiple domains: a single security event might require configuration analysis, log investigation, policy evaluation, and compliance impact assessment.

Specialist agents focus on their domain (infrastructure configuration, log analysis, policy interpretation) while the coordinator manages sequencing, evidence aggregation, and final report generation.

Reference Architecture R - Hybrid Compliance Workflow (Step Functions + Agent)

Platform:

Step Functions + AgentCore

When to use:

The compliance workflow has deterministic steps (collecting evidence from APIs, running config checks, pulling audit logs) mixed with judgment steps (interpreting findings, assessing severity, generating remediation guidance).

Step Functions orchestrates data collection and report assembly.

The agent handles the analysis that requires contextual reasoning.

Identical in concept to Architecture H from Edition 3, applied to compliance workflows where the deterministic evidence collection and the AI-powered analysis are clearly separable.

The 25 Use Cases

Configuration and Infrastructure Compliance

#101 - Cloud Configuration Compliance Agent

Pattern: Modernization from RPA

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: N

What the agent does:

Continuously monitors AWS account configurations against compliance frameworks (SOC 2, HIPAA, PCI DSS, CIS Benchmarks).
Goes beyond AWS Config’s rule-based checks by adding contextual analysis.
When Config flags a non-compliant resource, the agent determines whether the finding represents actual risk or an acceptable exception (a public S3 bucket that serves static website assets versus one containing customer data).
Pulls resource tags, account context, and the organization’s exception registry to make this determination.
For genuine violations, generates a finding with the specific control requirement violated, the evidence (current configuration versus required state), a severity assessment based on data sensitivity and exposure, and a recommended remediation with step-by-step instructions.
Tracks remediation progress and escalates unresolved findings based on SLA timers.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (read-only + bounded remediation via Gateway), AWS Config, Security Hub, S3 (evidence store with object lock), DynamoDB (finding state and exceptions registry), EventBridge, SNS (escalation)

You need this if: Your compliance team manually reviews AWS Config findings, spends hours determining which findings are real violations versus acceptable configurations, and maintains a spreadsheet of exceptions that nobody cross-references consistently.

#102 - Infrastructure as Code Policy Agent

Pattern: New build

Platform: Step Functions + AgentCore (hybrid)

Complexity: Strategic Bet

Reference Architecture: R

What the agent does:

Analyzes infrastructure as code (Terraform, CloudFormation, CDK) changes before deployment for compliance violations.
Triggers from pull requests or CI/CD pipeline events.
Step Functions collects the planned changes (terraform plan output or CloudFormation changeset), resource context from the target account, and applicable compliance policies.
The agent analyzes the planned changes against security baselines, compliance requirements, and organizational standards.
Catches issues that static linting misses: a security group rule that technically passes CIS benchmarks but violates your organization’s more restrictive internal policy, or a combination of configuration changes that individually look fine but collectively create an exposure.
Returns a structured review with approve, block, or require-justification decisions.
Approved changes proceed through the pipeline.
Blocked changes include the specific policy violation and remediation guidance.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, AgentCore Policy, CodePipeline/CodeBuild, Bedrock Knowledge Bases (policy corpus), S3 (plan artifacts), DynamoDB (review history), SNS (notifications)

You need this if: Your cloud security reviews are manual bottlenecks in the deployment pipeline, static policy tools (OPA, Checkov, tfsec) catch syntax-level violations but miss context-dependent risks, and teams wait days for security approval on infrastructure changes.

#103 - Drift Detection and Remediation Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: N

What the agent does:

Detects configuration drift between the declared state (IaC repositories) and actual state (running infrastructure).
Runs on a schedule and compares deployed resources against their source-of-truth definitions.
For each drift instance, determines whether the drift is benign (auto-scaling adjusted instance count, which is expected), potentially harmful (someone manually modified a security group), or critical (encryption was disabled on a database).
Generates a drift report with severity classification and recommended remediation: reimport the resource to update state, revert the manual change, or update the IaC to reflect an intentional modification.
For pre-approved bounded remediations (reverting security group changes that violate baseline), executes the fix automatically with full audit trail.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (bounded auto-remediation via Gateway), CloudFormation Drift Detection, AWS Config, S3 (drift history), DynamoDB (baseline registry), SNS (alerts)

You need this if: Manual changes to production infrastructure accumulate between audits, your IaC state files diverge from reality, and you discover drift reactively when something breaks rather than proactively through monitoring.

#104 - Multi-Account Governance Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: N

What the agent does:

Manages compliance posture across an AWS Organization with dozens or hundreds of accounts.
Aggregates Security Hub findings, Config evaluations, and GuardDuty alerts across all accounts into a unified compliance view.
Identifies patterns that span accounts: the same misconfiguration appearing in multiple production accounts (suggesting a flawed baseline), accounts that consistently drift from guardrails (suggesting a team that needs training or tooling), and privilege escalation paths that cross account boundaries.
Generates per-account compliance scorecards and organization-wide trend reports.
Prioritizes remediation by combining severity with blast radius: a medium-severity finding in a production account with customer data ranks higher than a high-severity finding in a sandbox account.

AWS services: Bedrock (Claude), AgentCore Runtime, Security Hub (cross-account aggregation), AWS Organizations, AWS Config (aggregator), CloudFormation StackSets, S3 (compliance reports), DynamoDB (account metadata and scores), EventBridge

You need this if: Your organization manages 20+ AWS accounts, compliance posture varies wildly between accounts, and your security team lacks a consolidated view of where the highest-priority gaps exist across the organization.

#105 - Container and Workload Security Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: N

What the agent does:

Monitors container workloads for security compliance across ECS, EKS, and Fargate environments.
Scans container images for vulnerabilities using Amazon Inspector, evaluates runtime configurations against CIS Kubernetes benchmarks, monitors pod security policies and network policies, and detects containers running with elevated privileges or without resource limits.
For each finding, provides context that static scanners lack: whether the affected container is internet-facing or internal-only, whether the elevated privileges are documented as required for the workload’s function, and how the finding compares to the organization’s risk tolerance for that environment tier.
For organizations with mature SBOM pipelines and runtime telemetry, the agent can additionally assess whether a vulnerable package is reachable in the application’s execution path, though this requires instrumentation most teams lack initially.
Generates remediation PRs with updated Dockerfiles or Kubernetes manifests for straightforward fixes.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Inspector, Amazon ECR, EKS/ECS APIs, GuardDuty (EKS Runtime Monitoring), S3 (scan results), DynamoDB (workload inventory), CodeCommit/GitHub API (remediation PRs)

You need this if: Your container security consists of scanning images at build time but not monitoring runtime configuration, you have no systematic way to prioritize which vulnerabilities actually matter for your workloads, and container security findings pile up faster than your team can triage them.

Identity and Access Governance

#106 - IAM Least-Privilege Analysis Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: N

What the agent does:

Analyzes IAM policies across the organization to identify over-permissioned roles, unused permissions, and privilege escalation paths.
Compares granted permissions against actual usage from CloudTrail and IAM Access Analyzer.
For each role, generates a recommended policy that maintains the permissions actually used while removing unused grants.
Identifies risky patterns: roles with wildcard actions, cross-account assumptions without conditions, policies that allow privilege escalation through iam:PassRole or sts:AssumeRole chains.
Produces a prioritized list of remediation actions ranked by risk (a production service role with unused admin permissions ranks above a developer’s sandbox role with broad S3 access).

AWS services: Bedrock (Claude), AgentCore Runtime, IAM Access Analyzer, CloudTrail (usage history), IAM Policy Simulator, S3 (policy analysis reports), DynamoDB (permission inventory)

You need this if: Your IAM policies accumulated through copy-paste-and-expand over years, nobody knows which permissions are actually needed, and your last access review revealed that 60% of permissions granted are never used.

#107 - Access Review and Certification Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Strategic Bet

Reference Architecture: R

What the agent does:

Manages the periodic access review process that SOC 2, HIPAA, and ISO 27001 require.
Step Functions orchestrates the review cycle: extracting current access grants from identity systems, generating review packages for each manager, tracking certification responses, and processing revocation requests.
The agent handles the judgment layer: comparing each user’s access grants against their role requirements, flagging unusual patterns (a marketing user with database access, a former contractor’s permissions still active), drafting recommendations for each access item (certify, revoke, or investigate), and generating the final certification report with evidence.
Reduces the review from a multi-week organizational burden to a focused exercise where managers review agent recommendations rather than raw access lists.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, IAM/SSO APIs, identity provider API (Okta/Azure AD), S3 (review evidence), DynamoDB (review state and decisions), SES (manager notifications)

You need this if: Your quarterly access reviews take 3+ weeks, managers rubber-stamp certifications because reviewing raw access lists is overwhelming, and your compliance team spends days chasing incomplete reviews.

#108 - Service Account and Non-Human Identity Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: N

What the agent does:

Manages the lifecycle of service accounts, API keys, machine identities, and other non-human credentials that proliferate across cloud environments.
Maintains an inventory of all non-human identities with their purpose, owner, creation date, last usage, and associated permissions.
Detects orphaned service accounts (no recent usage, owner left the organization), over-provisioned machine roles (granted broad access for initial setup, never scoped down), and credentials approaching expiration without rotation plans.
Generates ownership attribution by correlating service account creation events with CI/CD pipeline configurations, Terraform state, and application deployment records.
Produces a non-human identity risk report with specific remediation actions: revoke, scope down, rotate, or reassign ownership.

AWS services: Bedrock (Claude), AgentCore Runtime, IAM APIs, AWS Secrets Manager, CloudTrail (credential usage), Organizations (account context), S3 (identity inventory reports), DynamoDB (identity registry), EventBridge (lifecycle events)

You need this if: Nobody knows how many service accounts exist in your environment, orphaned credentials from decommissioned projects remain active, and your last audit flagged non-human identity management as a material finding.

#109 - Privilege Escalation Path Detection Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: O

What the agent does:

Maps and periodically reassesses privilege escalation paths across your AWS environment.
Analyzes interactions between IAM policies, resource policies, trust relationships, and service-linked roles to identify non-obvious paths from low-privilege starting points to high-privilege outcomes.
Detects scenarios like: a developer role can assume a deployment role that can modify IAM policies, creating an indirect path to admin access.
A Lambda function’s execution role can write to an S3 bucket that triggers another Lambda with broader permissions.
Goes beyond IAM Access Analyzer’s individual policy analysis by modeling multi-step chains.
Produces a privilege escalation graph showing identified paths, prioritized by the starting privilege level required and the target privilege reached.
The depth of analysis depends on the maturity of your asset inventory, IAM documentation, and resource policy coverage - the agent identifies paths it can observe, but cannot map escalation vectors through services it lacks visibility into.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (graph analysis state), IAM APIs, IAM Access Analyzer, Resource Policy APIs (S3, SQS, Lambda, KMS), Organizations (trust relationships), Neptune or DynamoDB (escalation graph), S3 (analysis reports)

You need this if: Your penetration tests regularly find privilege escalation paths that your access reviews missed, you lack visibility into cross-account trust relationship chains, and you cannot answer “what is the blast radius if this role is compromised?”

#110 - Just-In-Time Access Provisioning Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: N

What the agent does:

Implements just-in-time (JIT) access for elevated permissions.
When an engineer needs production database access for an incident investigation, they request it through the agent.
The agent evaluates the request against context: is there an active incident? Does the requester’s role normally interact with this resource? Is the requested scope proportional to the stated need? What peer approvals are required? For approved requests, provisions time-bounded access (1 hour, 4 hours, configurable per resource type) with the minimum permissions needed.
Automatically revokes access when the window expires or the stated reason resolves.
Logs every request, decision, and access session for audit.
Denies requests that violate policy constraints and explains why.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (approval rules), IAM (temporary credentials), STS (session policies), SSO (permission sets), DynamoDB (request and session state), CloudTrail (session logging), SNS (approvals and notifications)

You need this if: Your engineers have standing access to production systems they rarely need, your audit logs show persistent access being used for brief investigation tasks, and your compliance framework requires time-bounded privileged access but your current process involves Slack messages and manual IAM changes.

Incident Response and Threat Detection

#111 - Security Alert Triage Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: O

What the agent does:

Triages the flood of security alerts from GuardDuty, Security Hub, and third-party tools that overwhelm SOC teams.
For each alert, pulls contextual data: the resource involved, its business criticality, recent changes to the resource, the identity that triggered the alert, historical alerts for the same resource or identity, and threat intelligence enrichment.
Classifies each alert as true positive (investigate), false positive (suppress with justification), or informational (log and monitor).
For true positives, assigns severity based on business context (not just the alert’s native severity), identifies the most likely attack stage (reconnaissance, initial access, lateral movement, exfiltration), and routes to the appropriate analyst with a pre-built investigation package.
Tracks suppression patterns to identify when a previously-suppressed alert type starts appearing in new contexts that warrant re-evaluation.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (investigation context), GuardDuty, Security Hub, CloudTrail, VPC Flow Logs, DynamoDB (alert enrichment and suppression rules), SNS (routing)

You need this if: Your SOC receives 500+ alerts daily, analysts spend more time triaging than investigating, false positive rates exceed 70%, and genuine threats get buried in alert noise.

#112 - Incident Timeline Reconstruction Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: O

What the agent does:

Reconstructs the timeline of a security incident by correlating events across CloudTrail, VPC Flow Logs, application logs, authentication logs, and DNS query logs.
An analyst provides a starting point (a suspicious API call, a compromised credential, a malware detection) and the agent works backward and forward in time to build a complete narrative.
Identifies the initial access vector, lateral movement between resources, data access patterns, and exfiltration indicators.
Produces a structured timeline with timestamps, actors, actions, and affected resources.
Highlights gaps in the timeline where visibility is missing (a period with no CloudTrail events for a known-compromised identity suggests log tampering or activity in an unmonitored account).
Uses AgentCore Memory to maintain investigation state across multiple analysis sessions.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory, CloudTrail (event history), CloudWatch Logs Insights, Athena (log querying at scale), VPC Flow Logs, Route 53 DNS logs, S3 (investigation artifacts), DynamoDB (timeline state)

You need this if: Incident investigations take days because analysts manually correlate events across 5+ log sources, critical context gets lost between shift handoffs, and post-incident reports reveal gaps where the team missed lateral movement because they did not check a specific log source.

#113 - Threat Intelligence Correlation Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: O

What the agent does:

Correlates internal security telemetry with external threat intelligence to identify whether observed activity matches known threat actor tactics, techniques, and procedures (TTPs).
Ingests indicators of compromise (IoCs) from threat feeds, maps them against internal logs (DNS queries matching known-malicious domains, network connections to flagged IPs, file hashes matching known malware), and assesses relevance to your specific environment.
Goes beyond simple IoC matching: analyzes sequences of events that match documented attack patterns from MITRE ATT&CK even when individual IoCs do not trigger.
Produces a daily threat briefing highlighting new intelligence relevant to your technology stack and industry, active threats detected in your environment, and emerging TTPs that your detection rules do not cover.

AWS services: Bedrock (Claude), AgentCore Runtime, GuardDuty (threat detection), Security Hub (finding aggregation), Athena (log analysis at scale), third-party threat intelligence APIs via AgentCore Gateway, S3 (intelligence archive), DynamoDB (correlation state)

You need this if: Your threat intelligence program consists of receiving feed updates that nobody correlates against internal data, your detection rules cover known IoCs but not behavioral patterns, and you lack the analyst capacity to proactively hunt for threats that match new intelligence.

#114 - Automated Containment Agent

Pattern: New build

Platform: Step Functions + AgentCore (hybrid)

Complexity: Foundation Build

Reference Architecture: R

What the agent does:

Executes pre-approved containment actions when security events meet defined severity thresholds.
Step Functions handles the deterministic containment playbook: isolating a compromised EC2 instance (modifying security groups to restrict network access while preserving forensic state), revoking compromised credentials (disabling IAM keys or SSO sessions), blocking malicious IPs (updating WAF rules or network ACLs), and preserving evidence (snapshotting EBS volumes, copying logs to a forensic bucket).
The agent handles the judgment calls within bounded parameters: determining whether the alert severity justifies automated containment, assessing blast radius (will isolating this instance break a production service?), choosing the appropriate containment level (full isolation versus restricted access), and notifying the on-call team with the containment summary and next-steps recommendation.
Every containment action is logged immutably, reversible by design, and constrained by AgentCore Policy to a pre-approved action set exposed through Gateway.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, AgentCore Policy (containment boundaries via Gateway), EC2 APIs (security groups, snapshots), IAM (credential revocation), WAF, Network Firewall, S3 (forensic evidence), CloudTrail, SNS (incident notifications)

You need this if: Your mean time to contain a confirmed threat exceeds 4 hours because containment requires manual intervention, your runbooks exist but are not automated, and after-hours incidents wait until morning because on-call responders are not comfortable executing containment steps manually.

#115 - Vulnerability Prioritization Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Quick Win

Reference Architecture: N + K

What the agent does:

Transforms raw vulnerability scanner output (Amazon Inspector, third-party scanners) into a prioritized remediation queue that reflects actual risk rather than raw CVSS scores.
For each vulnerability, the agent adds business context: Is the affected resource internet-facing or internal-only? Does it process sensitive data? Is there a known exploit in the wild? What compensating controls exist (WAF rules, network segmentation)? For organizations with runtime telemetry or SBOM data, it can additionally assess whether the vulnerable component is loaded in production execution paths.
Recalculates priority based on this contextualized risk and groups vulnerabilities into remediation tickets that can be assigned to the responsible team.
Quick dashboards show vulnerability posture trends, mean-time-to-remediate by team, and SLA compliance.
Tracks whether patching one vulnerability introduces new issues by monitoring for regressions after remediation.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (Quick Sight), Amazon Inspector, ECR (container vulnerabilities), Systems Manager Patch Manager, S3 (vulnerability data), DynamoDB (prioritization state and SLA tracking), SNS (escalation)

You need this if: Your vulnerability backlog grows faster than your team can remediate, patching decisions are based on CVSS scores without business context, and your security team cannot answer “what are the 10 most important vulnerabilities to fix this week?”

Regulatory Compliance and Audit

#116 - Regulatory Change Impact Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: P

What the agent does:

Monitors regulatory and standards body publications for changes that affect your organization.
When a new regulation passes, a standard updates (PCI DSS 4.0, NIST CSF 2.0, ISO 27001:2022), or a regulatory guidance document issues, the agent analyzes the changes against your current control framework.
Produces a gap analysis: which new requirements your existing controls already satisfy, which require new controls, which require modifications to existing controls, and which require further legal interpretation.
Maps each requirement to specific AWS configurations, organizational processes, and documentation that would satisfy it.
Generates a remediation roadmap with relative effort sizing (small, medium, large) based on the type of gap, though accurate effort estimates require validated control mappings and organizational context that the agent uses as inputs rather than infers independently.
Bedrock Knowledge Bases stores the regulatory corpus and your control framework documentation for retrieval during analysis.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (regulatory corpus + control framework), S3 (regulatory document archive), DynamoDB (control mapping and gap tracking), SES (stakeholder notifications)

You need this if: Your compliance team learns about regulatory changes from news articles instead of systematic monitoring, gap analyses take weeks of manual review, and you lack a maintained mapping between regulatory requirements and your actual controls.

#117 - Audit Evidence Collection Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: R

What the agent does:

Automates the evidence collection process for SOC 2, HIPAA, ISO 27001, and PCI DSS audits.
Step Functions orchestrates the collection workflow: pulling screenshots of security configurations, extracting access review records, downloading change management logs, collecting policy documents, and retrieving system configurations.
The agent handles evidence that requires judgment: determining whether a collected artifact actually satisfies the specific control requirement (a screenshot of a security group is not useful evidence if it does not show the specific rules the auditor needs), identifying gaps where evidence is incomplete or stale, and generating the narrative descriptions that accompany each evidence artifact explaining how it demonstrates control effectiveness.
Organizes evidence by control framework, period, and control objective in a structure that maps directly to auditor requests.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, AWS Config (configuration snapshots), CloudTrail (activity evidence), IAM (access review data), S3 (evidence repository with object lock), DynamoDB (evidence catalog and status), SES (collection status notifications)

You need this if: Audit preparation takes your team 4+ weeks of manual evidence collection, you maintain evidence in scattered folders and screenshots, and auditors repeatedly request additional evidence because initial submissions do not precisely match the control being tested.

#118 - Compliance Questionnaire Response Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: P

What the agent does:

Drafts responses to security and compliance questionnaires (vendor assessments, customer security questionnaires, RFP security sections, SIG/CAIQ self-assessments).
Maintains a knowledge base of previously approved responses, current certifications, architectural documentation, and security control descriptions.
For each question, retrieves the most relevant prior approved answer, evaluates whether it is still accurate given current state, and drafts a response.
Flags questions that reference capabilities you do not have, controls that have changed since the last approved response, and areas where the question’s scope is ambiguous.
Produces a draft questionnaire for the security team to review rather than generating final responses autonomously.
Tracks response reuse rates and identifies questions that consistently require manual rework (indicating a gap in the knowledge base).

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (approved responses + security documentation), S3 (questionnaire archive), DynamoDB (response tracking and reuse metrics)

You need this if: Your security team spends 10+ hours per week responding to customer security questionnaires, the same questions recur across questionnaires with slight wording variations, and response quality is inconsistent because different team members write different answers to the same question.

#119 - Control Testing and Validation Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: N

What the agent does:

Performs continuous control testing rather than point-in-time audit checks.
For each control in your framework, maintains a test procedure and runs it on a defined schedule.
Tests are not just configuration checks (which AWS Config handles natively) but operational effectiveness tests: Does the incident response process actually trigger within the defined SLA? Does the change management approval workflow enforce separation of duties in practice? Do access revocations propagate within the required timeframe? The agent simulates control scenarios (creating test events, measuring response times, verifying workflow execution) within safe test boundaries defined by AgentCore Policy via Gateway-exposed tools.
Produces a continuous control effectiveness report showing which controls are operating effectively, which are degraded, and which have failed their most recent test.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (test boundaries via Gateway), Step Functions (test orchestration), CloudWatch (SLA measurement), Config (configuration state), CloudTrail (process verification), S3 (test results), DynamoDB (control registry and test schedule)

You need this if: You only discover that a control is ineffective during the annual audit, your control testing is limited to checking configurations rather than verifying operational effectiveness, and you cannot demonstrate continuous compliance between audit periods.

#120 - Privacy and Data Protection Compliance Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: N

What the agent does:

Monitors and enforces data protection requirements across GDPR, CCPA, HIPAA, and other privacy regulations.
Tracks data processing activities against the organization’s Record of Processing Activities (ROPA).
Monitors data flows for transfers that might violate cross-border transfer restrictions.
Detects new data collection or processing activities that lack proper legal basis documentation.
When a data subject submits an access request (DSAR) or deletion request, maps all systems where that individual’s data resides and generates a response package showing what data exists and where.
Validates that data retention policies are actually being enforced by checking for data older than its defined retention period.
Produces a privacy posture report showing compliance status by regulation, processing activity, and data category.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (data access boundaries), Amazon Macie (PII discovery), Glue Data Catalog (data classification), S3 (ROPA and evidence), DynamoDB (processing activity registry, DSAR tracking), EventBridge (retention monitoring)

You need this if: Your privacy team maintains the ROPA in a spreadsheet that is perpetually outdated, DSAR responses take weeks because nobody knows all the systems where a person’s data lives, and you cannot prove that data retention policies are enforced rather than merely documented.

Security Operations and Posture Management

#121 - Security Posture Scoring Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Quick Win

Reference Architecture: N + K

What the agent does:

Calculates and tracks a composite security posture score across the organization, breaking it down by account, team, service, and compliance framework.
Aggregates inputs from Security Hub findings, Config compliance, vulnerability scan results, IAM analysis, encryption status, and logging coverage into a weighted score that reflects actual risk posture.
The scoring model is configurable: different controls carry different weights based on your organization’s threat model and regulatory requirements.
Tracks score trends over time to show whether security posture is improving or degrading.
Identifies which specific findings have the largest impact on the score (fixing one critical IAM finding might improve the score more than fixing ten low-severity configuration issues).
Quick dashboards provide executive, team-level, and technical views of the same underlying data.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (Quick Sight), Security Hub (finding aggregation), AWS Config, Amazon Inspector, S3 (score history), DynamoDB (scoring model and weights), EventBridge (scoring schedule)

You need this if: Your CISO cannot answer “is our security posture better or worse than last quarter?” with data, security improvement efforts lack clear metrics, and team accountability for security is impossible without per-team visibility.

#122 - Third-Party Risk Assessment Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: P

What the agent does:

Manages the vendor security assessment lifecycle.
When a new vendor enters procurement or an existing vendor reaches its reassessment date, the agent collects their security documentation (SOC 2 reports, penetration test summaries, compliance certifications, security questionnaire responses).
Analyzes the documentation against your organization’s third-party risk requirements.
Identifies gaps between your requirements and the vendor’s demonstrated controls.
Assesses residual risk for each gap and recommends risk acceptance, contractual mitigation, or technical controls to compensate.
Tracks vendor risk scores over time, flags vendors whose certifications expire, and triggers reassessments based on risk tier and time since last review.
Correlates vendor risk levels with the data and access each vendor has to prioritize which assessments matter most.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (vendor documentation corpus + risk framework), S3 (vendor documentation archive), DynamoDB (vendor inventory, risk scores, assessment schedule), SES (vendor communications and internal notifications), EventBridge (assessment triggers)

You need this if: Your third-party risk program is backlogged by 6+ months, vendor assessments are inconsistent in depth depending on who performs them, and you cannot answer “which of our vendors with access to customer data have not been assessed in the last year?”

#123 - Security Runbook Automation Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: R

What the agent does:

Executes security operational runbooks that currently require analysts to follow manual steps.
Step Functions handles the deterministic procedural steps: collecting data from specified sources, executing API calls in sequence, and routing outputs to designated destinations.
The agent handles the judgment steps within each runbook: interpreting the results of diagnostic commands, deciding which branch of the procedure applies based on the situation, and determining when the runbook’s scope has been exceeded and human intervention is needed.
Covers common security operations: investigating a failed login surge, triaging an unauthorized API call, assessing a new GuardDuty finding type, and performing routine security hygiene checks.
Each runbook is version-controlled with clear scope boundaries, pre-approved actions, and defined escalation criteria.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, AgentCore Policy (action boundaries per runbook via Gateway), CloudTrail, GuardDuty, IAM APIs, CloudWatch Logs, S3 (runbook definitions and execution logs), DynamoDB (execution state), SNS (escalation)

You need this if: Your security runbooks exist as wiki pages that analysts follow manually, execution consistency varies by analyst, after-hours incidents wait because junior analysts are not comfortable executing advanced runbooks, and you want to reduce mean time to respond without reducing investigation quality.

#124 - Cloud Cost and Security Trade-off Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Strategic Bet

Reference Architecture: N + K

What the agent does:

Evaluates the cost implications of security decisions and the security implications of cost optimization recommendations.
When a FinOps tool recommends downgrading a Reserved Instance or consolidating workloads, the agent assesses whether the change affects security posture (moving workloads into a shared account might violate isolation requirements, downgrading a WAF tier might remove protections).
When the security team recommends enabling additional logging, encryption, or network segmentation, the agent estimates the cost impact with specificity: “Enabling VPC Flow Logs across all production VPCs will add approximately $2,400/month based on current traffic volumes.” Bridges the gap between security teams that recommend controls without cost awareness and finance teams that cut costs without security awareness.
Quick dashboards show the security cost of each compliance framework and the cost savings available at each risk tolerance level.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (Quick Sight), Cost Explorer, Security Hub, AWS Config, Pricing API, S3 (analysis reports), DynamoDB (trade-off registry)

You need this if: Security and finance teams operate in silos, cost optimizations occasionally degrade security posture without anyone noticing, and security recommendations get rejected for budget reasons without quantified risk-versus-cost analysis.

#125 - Compliance Communication and Reporting Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Quick Win

Reference Architecture: N + K

What the agent does:

Generates compliance status reports for different audiences from the same underlying data.
The board receives a one-page risk summary with trend arrows and material findings requiring attention.
The CISO receives a detailed posture report with metrics, open findings by severity, remediation velocity, and control effectiveness trends.
Engineering teams receive their team-specific findings with remediation guidance and SLA status.
Regulators receive the specific evidence and attestations their framework requires.
The agent pulls from the same compliance data (Security Hub findings, Config evaluations, audit evidence, control test results) and formats for each audience’s information needs and technical depth.
Operates on a schedule for recurring reports and on-demand for ad hoc requests.
Quick dashboards provide the interactive exploration layer behind each report.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (Quick Sight), Security Hub, AWS Config, S3 (report archive), DynamoDB (report templates and distribution lists), EventBridge (reporting schedule), SES (report distribution)

You need this if: Your compliance team produces reports manually for different stakeholders, each report takes hours to compile, and different audiences receive the same level of detail regardless of whether they need a board summary or an engineering action list.

What These 25 Patterns Reveal

The trust model separates this edition from the previous four.

Every pattern in this edition operates under the principle that agent recommendations require human validation before becoming organizational decisions.

Customer support agents can auto-resolve tickets because a wrong answer affects one customer.

Compliance agents produce findings that affect audit outcomes, regulatory standing, and organizational risk posture.

The human-in-the-loop is a feature, not a limitation.

AgentCore Policy carries more weight here than in any other edition.

Fifteen patterns explicitly reference AgentCore Policy for constraining agent actions routed through Gateway.

In earlier editions, policy prevents the agent from doing something unexpected. In this edition, policy enforcement is itself a compliance requirement - demonstrating that the AI system operates within defined boundaries is part of the control framework.

The Cedar-based authorization model, generated from natural-language policy definitions and validated for safety conditions, produces an auditable record of what tool interactions the agent was authorized to perform.

For actions not routed through Gateway, standard IAM policies and application-level controls provide the enforcement boundary.

Both layers feed into audit evidence.

Evidence preservation is an architectural concern, not an operational one.

Nine patterns include an immutable evidence store (S3 with object lock or equivalent). This is not logging for debugging.

This is evidence for auditors that the system operated as designed.

The decision to use object lock with compliance mode versus governance mode, the retention period configuration, and the evidence catalog structure are design decisions that affect audit outcomes years after the agent runs.

The hybrid Step Functions + AgentCore pattern from Edition 3 reappears heavily. Five patterns use the hybrid approach.

Compliance workflows have significant deterministic components (collecting evidence from APIs, running scheduled checks, executing remediation playbooks) mixed with judgment components (interpreting findings, assessing severity, generating recommendations).

The hybrid approach keeps the deterministic parts auditable through Step Functions’ visual execution history while reserving the agent for reasoning steps.

Quick Wins cluster around reporting and analysis rather than automated action.

The ten Quick Win patterns (#101, #103, #106, #111, #115, #117, #118, #121, #123, #125) focus on analyzing existing data and generating insights rather than taking autonomous actions.

This reflects the trust boundary: starting with agents that help humans understand compliance posture is lower risk than agents that autonomously remediate findings.

The Strategic Bets and Foundation Builds introduce more automation as the organization builds confidence in agent accuracy and appropriate scoping.

Multi-agent architectures appear only for complex investigations.

Architecture Q shows up when an investigation spans multiple security domains (configuration, logs, policies, threat intelligence) and a single agent’s context window cannot hold all the relevant data.

Most compliance workloads are well-served by a single focused agent with the right tools.

125 Patterns Across Five Editions

This series cataloged 125 AI agent architectures across five domains: customer-facing service, internal productivity, workflow automation, data analytics, and compliance.

A few observations span all 125.

The platform choice follows the workload, not the hype cycle.

AgentCore dominates when the agent needs custom tool orchestration, multi-step reasoning, or fine-grained policy control.

Quick dominates when business users need self-service analytics or visualization.

Q Business dominates for enterprise knowledge retrieval with permission awareness. Step Functions handles deterministic workflows.

The agents that perform best in production use the simplest platform that meets their requirements, not the most architecturally impressive option.

Start with the agent that solves the pain you already measure.

The strongest implementations across all five editions share one trait: the team could quantify the problem before building the solution.

Hours spent triaging alerts, days spent collecting audit evidence, weeks spent waiting for analytics requests.

If you cannot measure the current cost, you cannot demonstrate the agent’s value after deployment.

Build the evidence layer before the reasoning layer. Agents are only as good as the data they can access. Knowledge bases, metadata catalogs, semantic layers, tool integrations, and policy definitions are the foundation that makes agent reasoning reliable.

Organizations that skip these prerequisites and build the agent first discover that their sophisticated reasoning engine produces unreliable outputs because the inputs are incomplete or incorrect.

Governance scales with autonomy.

A read-only agent that generates reports needs lightweight oversight. An agent that auto-remediates security findings needs comprehensive policy enforcement, audit trails, and bounded action sets.

The governance investment should be proportional to the agent’s ability to affect the organization.

Every edition in this series shows the progression: Quick Wins start with analysis, Strategic Bets add bounded actions, and Foundation Builds implement the governance infrastructure that enables broader autonomy later.

The 125 patterns exist so your next architecture discussion starts from a reference point rather than a blank whiteboard.

Pick the card that matches your problem, adapt the architecture to your environment, and build from there.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

25 Workflow Automation and Process Agent Patterns on AWS You Can Steal Right Now

Marcelo Acosta Cavalero — Mon, 13 Apr 2026 14:03:36 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

A logistics coordinator at a mid-size manufacturer spends every Monday morning copying order data from the ERP into a spreadsheet, cross-referencing inventory levels in a second system, emailing the warehouse team about shortages, and updating the shipping schedule in a third tool.

The entire ritual takes three hours.

The data exists in every system.

The logic is consistent.

The human is the integration layer.

This is the third edition of a five-part series cataloging real AI architecture patterns running on AWS.

Edition 1 covered customer-facing agents.

Edition 2 covered internal knowledge and productivity agents.

This edition focuses on the unsexy middle: workflow automation and process agents that operate behind the scenes, moving data between systems, enforcing business rules, handling exceptions, and keeping operations running without direct human interaction.

No chat interfaces, no employee-facing Q&A.

These agents augment or progressively replace the spreadsheets, email chains, RPA scripts, and manual copy-paste rituals that hold business processes together, particularly where the process requires judgment that rule-based automation cannot handle.

If you missed the earlier editions, go back to Edition 1 for the “Agent or Not?” scoring framework and the AgentCore vs Quick breakdown.

Edition 2 introduced Amazon Q Business for knowledge workloads.

Those mental models apply here.

This edition adds a fourth consideration: when an agent is overkill and Step Functions plus EventBridge handles the job.

The Agent vs Orchestration Decision

Workflow automation sits in a gray zone.

Not every automated process needs an AI agent.

Many workflows are deterministic: if X happens, do Y, then Z, every time, no variation.

Step Functions handles those workflows with built-in error handling, retry logic, and visual debugging.

Adding an agent to a deterministic workflow adds cost, latency, and a reasoning layer that provides no value.

Agents earn their place in workflow automation when the process requires judgment calls.

When the system needs to interpret unstructured input, classify ambiguous data, decide which path to take based on context that cannot be reduced to simple rules, or handle exceptions that vary enough to defeat a static decision tree.

Use the same five questions from Edition 1 (workflow predictability, reasoning depth, tool access, conversational interaction, improvement over time) to score each workflow.

These thresholds are a useful heuristic: workflows scoring below 10 belong in Step Functions.

Between 10 and 15, consider a hybrid where Step Functions orchestrates the deterministic parts and an agent handles the judgment steps.

Above 15, build the agent.

Three practical signals help distinguish agent territory from orchestration territory.

First, input predictability: if every input follows a known schema with constrained values, orchestration handles it.

If inputs arrive as unstructured documents, free-text emails, or variable formats, an agent adds value.

Second, rule stability: if the business rules change rarely and can be fully enumerated, encode them in Step Functions.

If the rules shift frequently, contain ambiguity, or require interpretation of context, an agent adapts where static rules break.

Third, exception tolerance: if the process can halt on any unexpected input and wait for a human, orchestration is fine.

If exceptions are common enough that halting creates an operational bottleneck, an agent that handles the ambiguity keeps the process moving.

Reference Architectures for Process Agents

Process agents differ from the customer-facing and employee-facing patterns in the first two editions.

They rarely have a chat interface.

They trigger from events, schedules, or system state changes.

They interact with APIs, databases, and file systems rather than humans.

The reference architectures reflect this shift.

Reference Architecture G - Event-Driven Process Agent

Platform: AgentCore

When to use: The agent triggers from system events and makes decisions about how to process each event based on context. No human interaction during execution. The agent reads data, reasons about what to do, takes actions across systems through Gateway-mediated tool calls, and logs outcomes.

AgentCore Policy constrains which actions the agent can take, and AgentCore Observability provides tracing and visibility into agent execution and tool calls.

All tool interactions flow through the Gateway, which handles authentication, rate limiting, and MCP-compatible tool exposure.

Covers most document processing, data routing, exception handling, and system integration agents where the trigger is an event, not a person.

Reference Architecture H - Hybrid Orchestration (Step Functions + Agent)

Platform: Step Functions + AgentCore

When to use: The workflow has deterministic steps (data extraction, API calls, file transformations) mixed with judgment steps (classification, exception handling, content generation).

Step Functions orchestrates the overall flow and calls the agent only when reasoning is needed.

This hybrid pattern reduces unnecessary model invocations, improves debuggability through Step Functions’ visual execution history, and preserves deterministic control with built-in retries and error handling for the predictable steps.

Covers workflows where the majority of steps are deterministic and the remaining steps require contextual decision-making.

Reference Architecture I - Multi-System Process Coordinator

Platform: AgentCore (multi-agent or single agent with adapters)

When to use: The process spans 4+ systems where each system has different APIs, data formats, and error modes. The coordinator agent manages sequencing, handles partial failures, and tracks process state in a durable store (DynamoDB or RDS, not AgentCore Memory, which serves context retention rather than transactional workflow state).

System adapters normalize the interface to each backend.

This pattern centralizes coordination logic and can reduce the operational complexity of ad hoc point-to-point integrations, while still relying on stable adapters and contract validation at the API layer.

The 25 Use Cases

Document Processing and Data Extraction

#051 - Invoice Processing and Matching Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Receives invoices via email attachment, API upload, or S3 drop. Amazon Textract extracts line items, totals, vendor information, and payment terms with confidence scores.
The deterministic steps (extraction, format normalization, duplicate detection) run in Lambda via Step Functions.
The agent handles the judgment calls: matching invoices to purchase orders when line items do not align exactly, resolving quantity discrepancies, and flagging anomalies like unexpected price increases or new line items not on the PO.
Extraction results where Textract’s field-level confidence falls below a configured threshold route directly to human review rather than through the agent’s matching logic, because low-quality extraction makes downstream matching unreliable.
Clean matches route to the payment queue.
Exceptions route to AP (the team that handles invoice payments) with the agent’s analysis and a recommended resolution.

AWS services: Step Functions, Amazon Textract, Bedrock (Claude), AgentCore Runtime, S3 (document ingestion), Lambda (deterministic steps), DynamoDB (PO matching index), SES (exception notifications)

You need this if: Your accounts payable team manually matches invoices to purchase orders, spends 15+ minutes per exception, and processes more than 500 invoices per month.

#052 - Contract Data Extraction and Lifecycle Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: G

What the agent does:

Processes incoming contracts, amendments, and renewals.
Extracts key terms: effective dates, renewal windows, termination clauses, pricing schedules, SLA commitments, liability caps, and data handling provisions.
Drafts a deviation summary comparing extracted terms against company standards as a review aid for legal and procurement, not as an autonomous legal reviewer.
Populates the contract management system with structured data.
Monitors active contracts for upcoming renewals, expiring SLAs, and auto-renewal windows.
Sends alerts to the legal and procurement teams with a summary of each contract’s key obligations and upcoming milestones.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Textract, S3 (contract repository), DynamoDB (contract metadata), EventBridge (milestone scheduling), SES (alerts)

You need this if: Your legal team manually reviews contracts to track key dates and obligations, and you have missed renewal windows or auto-renewed unfavorable terms because nobody flagged the deadline.

#053 - Mail and Correspondence Classification Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: G

What the agent does:

Processes inbound correspondence from a shared mailbox, physical mail scans, or document upload.
Classifies each item by type (invoice, RFP, legal notice, customer complaint, regulatory filing, general inquiry), extracts key metadata, and routes to the appropriate department or workflow.
For structured document types (invoices, purchase orders), triggers the corresponding processing pipeline.
For unstructured items (letters, complaints), generates a summary and priority assessment before routing.
Can support page-level or section-level routing when document boundaries are reliably identified, though multipage segmentation adds complexity that should be validated per document type.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Textract, S3 (document intake), SQS (routing queues), EventBridge (workflow triggers), DynamoDB (classification audit trail)

You need this if: Your operations team manually sorts and distributes 100+ pieces of inbound correspondence daily, and misrouted items cause delays measured in days.

#054 - Receipt and Expense Categorization Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Processes bulk receipt uploads from corporate card feeds, receipt scanning apps, and email forwards.
Textract extracts merchant, amount, date, and line items.
Step Functions handles the deterministic categorization for known merchants mapped to GL codes.
The agent handles ambiguous cases: merchants with multiple possible categories, split purchases across categories, foreign transactions requiring context to categorize, and items that might be personal vs business expenses.
Categorization outputs are schema-constrained to valid GL codes at the application layer, preventing open-ended classification drift.
Low-confidence categorizations queue for human review with the agent’s reasoning attached.
AWS services: Step Functions, Amazon Textract, Bedrock (Nova), AgentCore Runtime, Lambda, DynamoDB (merchant mapping), S3 (receipt storage)

You need this if: Your finance team recategorizes 20-30% of auto-categorized expenses because simple rule-based systems cannot handle merchant ambiguity or context-dependent categorization.

#055 - Form Processing and Data Entry Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Digitizes and processes paper forms, PDFs, and scanned documents into structured data.
Handles applications, enrollment forms, registration documents, and intake paperwork.
Textract extracts fields.
Step Functions manages the pipeline (extraction, validation, storage).
The agent resolves extraction ambiguities, fills inferred fields from context (deriving a state from a zip code, a department from a job title), validates cross-field consistency (does the birth date match the stated age?), and flags irreconcilable conflicts for human review.
Loads validated data into the target system (CRM, HRIS, ERP) via API.

AWS services: Step Functions, Amazon Textract, Bedrock (Nova), AgentCore Runtime, Lambda, API Gateway, target system APIs

You need this if: Your team manually enters data from paper or PDF forms into business systems, the error rate exceeds 5%, and rework from data entry mistakes costs more than the entry itself.

Approval and Routing Workflows

#056 - Intelligent Request Routing Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: G

What the agent does:

Receives internal requests from a ticketing system, email, or Slack and routes them to the correct team, queue, or individual based on content analysis rather than user-selected categories.
Reads the full request text, identifies the actual need (which often differs from what the submitter selected in a dropdown), determines priority from impact signals in the text, and assigns it to the right handler.
When a request spans multiple teams, splits it into sub-requests with the relevant context for each team.
Tracks routing accuracy and rebalances when certain queues are overloaded.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (ticketing system API, Slack API), EventBridge, DynamoDB (routing rules and metrics)

You need this if: More than 25% of internal requests get rerouted at least once because users select the wrong category, and your average time-to-right-team exceeds 4 hours.

#057 - Dynamic Approval Chain Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: G

What the agent does:

Manages approval workflows where the approval chain varies based on request content, not just amount thresholds.
Analyzes the request (purchase, access change, policy exception, project proposal), determines which approvers are required based on the specific content (a security-related purchase needs CISO sign-off regardless of amount, a vendor change needs procurement review), checks for conflicts of interest, and routes accordingly.
Handles delegation when approvers are out of office, escalates stalled approvals, and maintains a complete audit trail.
When policies conflict (two approval rules apply and contradict), it flags the conflict and routes to the policy owner for clarification.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (tool boundaries and action constraints), HRIS API (delegation and reporting structure), EventBridge (escalation timers), SES (notifications), DynamoDB (approval state and audit trail)

Note: AgentCore Policy serves as the deterministic control boundary for which actions the agent can take. The approval lifecycle state (pending, approved, rejected, escalated) should live in DynamoDB or a workflow engine, not in the agent’s reasoning alone.

You need this if: Your approval workflows are either too rigid (everything follows the same chain regardless of content) or too manual (someone decides who needs to approve each request), and both modes create bottlenecks.

#058 - Exception Handling and Escalation Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: G

What the agent does:

Monitors automated business processes for exceptions that cannot be resolved by existing error-handling logic.
When a process throws an exception (payment fails, data validation rejects a record, an API call returns an unexpected response), the agent analyzes the exception context, classifies it, and determines whether the issue is transient (retry), correctable (apply a predefined remediation within policy boundaries), or requires human judgment.
For correctable issues, it recommends or executes predefined remediation actions constrained by AgentCore Policy, and logs every change applied.
For human-required issues, it identifies the right person based on the exception domain and provides a diagnostic summary.
Tracks exception patterns and recommends process improvements when the same exception type recurs.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (historical context for recurring exceptions), CloudWatch (process monitoring), EventBridge (exception events), SNS (escalation), DynamoDB (structured exception log and pattern analytics)

You need this if: Your operations team spends hours daily triaging process exceptions from automated systems, and 60% of those exceptions follow patterns that could be handled without human intervention.

System Integration and Data Synchronization

#059 - Cross-System Data Reconciliation Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: I

What the agent does:

Reconciles data across systems that should agree but drift over time.
Compares customer records between CRM and billing, inventory counts between warehouse management and ERP, employee data between HRIS and payroll.
Identifies discrepancies and flags them against deterministic authority rules that define which system owns each field.
The agent’s role is matching ambiguity and explanation, not final authority: it resolves near-matches the rules engine cannot handle, explains why records diverged, and either corrects the discrepant system automatically for low-risk fields (phone numbers, formatting differences) or queues corrections for approval for high-risk fields (pricing, compensation).
Runs on a schedule and produces reconciliation reports showing drift trends, common discrepancy sources, and data quality scores per system.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (multi-system APIs), EventBridge (scheduler), S3 (reconciliation reports), DynamoDB (discrepancy tracking), Quick (QuickSight for drift dashboards)

You need this if: Your teams spend time manually comparing data across systems, you have discovered billing errors caused by CRM-to-billing sync failures, and nobody trusts that any single system has the correct data.

#060 - Legacy System Migration Agent

Pattern: New build

Platform: Step Functions + AgentCore (hybrid)

Complexity: Foundation Build

Reference Architecture: H

What the agent does:

Assists in migrating data from legacy systems to modern platforms.
Step Functions manages the bulk migration pipeline (extract, transform, load).
The agent assists with the judgment-intensive parts: suggesting field mappings when the legacy-to-target relationship is not one-to-one, interpreting free-text fields that encode structured information (a “notes” field containing address changes, pricing exceptions, and relationship context), and flagging records where legacy data does not meet the new system’s validation rules.
Transformation pipelines and validation rules remain deterministic; the agent handles ambiguity and explanation, not schema authority.
Produces a migration report for each batch showing what mapped cleanly, what required interpretation, and what needs human review.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, AWS DMS (data migration), Lambda (ETL steps), Glue (data transformation), S3 (staging), RDS/DynamoDB (target systems)

You need this if: Your legacy migration is stalled because 20-40% of records need manual cleanup, and the cleanup requires understanding context encoded in free-text fields and inconsistent formats.

#061 - API Integration Mediator Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: I

What the agent does:

Sits between systems that need to exchange data but lack a clean integration.
Handles the translation layer: mapping fields between different schemas, converting units and formats, handling version differences in APIs, and managing the sequencing when updates to one system require multiple coordinated calls to another.
When an API starts returning unexpected responses, the agent can help detect and classify the integration drift, and in constrained cases map minor schema variations (a renamed field, a new optional property).
For breaking changes to authentication flows, data contracts, or endpoint structure, contract validation and human review remain necessary.
Maintains a log of all translations applied and flags when its confidence in a mapping drops below threshold, indicating the integration needs engineering attention.

AWS services: Bedrock (Nova), AgentCore Runtime, AgentCore Gateway, API Gateway (endpoint management), DynamoDB (mapping rules), CloudWatch (integration health metrics), SNS (confidence alerts)

You need this if: You maintain brittle point-to-point integrations between systems using custom Lambda functions, and every API version change breaks downstream processes.

#062 - Master Data Management Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: I

What the agent does:

Maintains a golden record for key business entities (customers, products, vendors, locations) across systems.
When a new record is created or an existing record is updated in any connected system, the agent evaluates whether it represents a new entity or matches an existing one.
Goes beyond simple deduplication: resolves near-matches by analyzing contextual signals (same company, different address might be a branch office, not a duplicate) and surfaces merge candidates with explanations.
Deterministic survivorship rules (most recent, most complete, source priority) govern which values win in a merge; the agent handles entity resolution and ambiguous matching, while stewardship workflows and audit trails ensure humans review high-impact merges before propagation to connected systems.
Produces data quality scorecards showing entity completeness, duplication rates, and conflict resolution patterns.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (multi-system APIs), DynamoDB (golden record store), EventBridge (change events), Glue (data profiling), Quick (QuickSight for data quality dashboards)

You need this if: Your customer or product master data is fragmented across 3+ systems, duplicates cause operational errors, and your team manually merges records using spreadsheets.

Order and Supply Chain Operations

#063 - Order Fulfillment Orchestration Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: I

What the agent does:

Manages the order-to-fulfillment process across ERP, warehouse management, shipping carriers, and inventory systems.
When an order is placed, the agent determines the fulfillment path within predefined business rules and constraints: which warehouse has stock, which carrier meets the delivery window at the lowest cost, and whether to split-ship or wait for all items.
Handles exceptions that derail RPA scripts: backorders, address validation failures, carrier capacity constraints, and partial inventory availability.
When fulfillment cannot meet the promised delivery date, the agent determines the best alternative (expedited shipping, alternative warehouse, substitute product) and updates the customer record.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (ERP, WMS, carrier APIs), EventBridge (order events), DynamoDB (fulfillment state), SNS (exception alerts)

You need this if: Your fulfillment team manually handles 15-20% of orders because they fall outside the happy path of your automated workflow, and each exception takes 20+ minutes to resolve.

#064 - Inventory Rebalancing Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Strategic Bet

Reference Architecture: G + Quick dashboards

What the agent does:

Monitors inventory levels across warehouses, stores, and distribution centers.
Identifies imbalances where one location has excess stock while another faces shortages.
Factors in demand forecasts, seasonal patterns, lead times, transfer costs, and upcoming promotions to recommend and initiate rebalancing transfers.
Handles the complexity that rule-based systems miss: a supplier delay requiring redistribution of remaining stock, seasonal demand shifts that differ from historical patterns, or promotional activity creating localized spikes.
External demand signals (social media trends, weather forecasts) can improve rebalancing accuracy but require dedicated data integrations beyond the base inventory stack.
Quick dashboards give supply chain managers visibility into inventory health, transfer recommendations, and demand signals.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (WMS APIs), Amazon Quick (QuickSight + Research), EventBridge (inventory triggers), Redshift (demand data), S3 (forecast models)

You need this if: Your inventory rebalancing happens weekly via spreadsheet analysis, and you regularly discover stockouts at one location while another location sits on 90+ days of the same product.

#065 - Vendor Performance Monitoring Agent

Pattern: New build

Platform: AgentCore backend + Quick analytics

Complexity: Quick Win

Reference Architecture: G + Quick dashboards

What the agent does:

Tracks vendor performance across delivery timeliness, quality metrics, pricing compliance, and responsiveness.
Pulls data from receiving logs, quality inspection records, invoice accuracy rates, and communication response times.
Identifies trends that indicate deteriorating performance before they become problems: gradually increasing lead times, rising defect rates, or more frequent pricing discrepancies.
Generates vendor scorecards and produces alerts when a vendor crosses a performance threshold.
For procurement reviews, compiles a vendor comparison across the same product category with data-backed performance rankings.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (QuickSight + Research), ERP API, quality management API, EventBridge (threshold alerts), Redshift (vendor metrics), SES (scorecards)

You need this if: Your procurement team evaluates vendors based on relationship and recent memory rather than data, and poor vendor performance gets discovered through downstream operational problems.

#066 - Purchase Order Generation Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Generates purchase orders based on inventory triggers, demand forecasts, and reorder points.
Step Functions handles the deterministic parts: checking reorder thresholds, pulling vendor pricing from contracts, applying standard terms.
The agent handles the judgment calls: selecting the best vendor when multiple suppliers carry the same item (factoring price, lead time, current performance score, and existing open orders) constrained by the approved vendor list and procurement policies, consolidating small orders to meet minimum order quantities, timing orders to optimize shipping costs, and adjusting quantities based on demand trend analysis.
Submits POs through the procurement system and tracks acknowledgment.

AWS services: Step Functions, Bedrock (Nova), AgentCore Runtime, Lambda, ERP API, DynamoDB (vendor performance and pricing), EventBridge (reorder triggers)

You need this if: Your purchasing team manually creates 50+ POs per week, the vendor selection process is inconsistent, and you miss volume discount thresholds because orders are not consolidated.

Financial Operations

#067 - Revenue Recognition Processing Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: G

What the agent does:

Supports the revenue recognition process under ASC 606 by analyzing contract terms, delivery milestones, and performance obligations.
The agent assists with the interpretation layer that trips up rule-based systems: multi-element arrangements where deliverables have different recognition patterns, variable consideration (usage-based pricing, performance bonuses), contract modifications mid-period, and the distinction between point-in-time and over-time recognition.
Authoritative recognition logic and GL postings remain rule-driven and controller-reviewed.
The agent’s primary value is exception triage, workpaper preparation, and drafting supporting schedules and documentation that the accounting team validates before posting.
Edge cases route to the controller with the agent’s analysis and a recommended treatment.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (recognition rules), ERP API, DynamoDB (contract terms), S3 (supporting documentation), EventBridge (milestone triggers)

You need this if: Revenue recognition consumes your accounting team’s last two weeks of every quarter, manual journal entries introduce errors, and your auditors consistently question the supporting documentation.

#068 - Bank Reconciliation Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Reconciles bank statements against the general ledger.
Step Functions handles the straightforward matches: same amount, same date, same reference number.
The agent investigates the rest: partial matches where the bank batched multiple payments, timing differences where the bank and GL dates differ, transactions that posted under a different description than expected, and unidentified deposits or charges.
For unmatched items, the agent searches across systems (AP, AR, payroll, expense reports) to identify the likely source.
Agent-suggested matches above a materiality threshold route to the accounting team for approval rather than auto-resolving.
Produces a reconciliation report with matched items, agent-investigated items (with reasoning and confidence), and genuinely unresolved items requiring human investigation.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, Lambda, banking API, ERP API, DynamoDB (matching index), S3 (reconciliation reports)

You need this if: Your monthly bank reconciliation takes 3+ days, most of that time is spent investigating exceptions, and you carry unreconciled items forward because nobody has time to resolve them.

#069 - Accounts Receivable Follow-Up Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: G

What the agent does:

Manages the collections process for outstanding invoices.
Monitors aging reports, identifies overdue accounts, and executes a tiered follow-up strategy.
Sends payment reminders using approved communication templates that reference the specific invoice details and payment terms.
When responses come in (payment committed, dispute raised, payment plan requested), the agent updates the AR system and adjusts its follow-up schedule.
For disputed invoices, it gathers the relevant backup documentation (PO, delivery confirmation, signed acceptance) and escalates to the AR team with a complete case file.
All outbound communications follow escalation rules and messaging policies configured in AgentCore Policy. Prioritizes follow-up efforts based on amount, aging, and customer payment history.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (communication and escalation rules), ERP/billing API, SES (communications), EventBridge (follow-up scheduling), DynamoDB (communication history and dispute tracking)

You need this if: Your DSO (days sales outstanding) exceeds industry benchmarks, your AR team cannot follow up on every overdue invoice, and you write off receivables that could have been collected with timely outreach.

Compliance and Regulatory Operations

#070 - Regulatory Filing Preparation Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: G

What the agent does:

Assembles regulatory filings by pulling required data from across business systems.
Handles recurring filings (tax returns, SEC reports, environmental disclosures, industry-specific compliance reports) by mapping filing requirements to internal data sources, extracting the relevant figures, performing consistency checks against prior period filings, and assembling draft filing packages with pre-populated data for review.
Identifies data gaps where the required information does not exist in a system and routes data collection requests to the responsible teams.
Maintains a filing calendar and initiates preparation with enough lead time based on the complexity of each filing.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (ERP, HRIS, compliance system APIs), EventBridge (filing calendar), S3 (filing archive), DynamoDB (filing requirements and data mappings)

You need this if: Your compliance team manually compiles data from 5+ systems for each regulatory filing, the process starts too late every cycle, and filing errors result from transcription mistakes between systems.

#071 - Audit Trail and Evidence Collection Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: G

What the agent does:

Organizes and packages evidence from native audit data sources for compliance audits.
The raw evidence comes from authoritative system logs: CloudTrail for API activity, CloudWatch Logs for application events, S3 for stored artifacts.
The agent’s role is enrichment and organization: when an auditable event occurs (access grant, configuration change, data modification, approval action), it correlates the event across sources and adds context: who, what, when, why (pulling from the approval chain or change request), and the system state before and after.
The agent indexes and packages this evidence, not invents it.
During audit preparation, retrieves and packages evidence against specific control requirements.
Maps internal controls to audit criteria (SOC 2 trust service criteria, HIPAA safeguards, PCI DSS requirements) and identifies gaps where evidence is incomplete or missing.

AWS services: Bedrock (Claude), AgentCore Runtime, CloudTrail, CloudWatch Logs, EventBridge (event capture), S3 (evidence store), DynamoDB (control mapping), OpenSearch Serverless (evidence search)

You need this if: Your team spends 100+ hours preparing for each audit cycle, evidence collection is a manual scavenger hunt across systems, and auditors request additional evidence because the initial package is incomplete.

#072 - Policy Change Impact Assessment Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: G

What the agent does:

When a regulatory change or internal policy update is proposed, the agent assesses the impact across business operations.
Analyzes the proposed change against current processes, systems, data flows, and compliance controls.
Identifies which teams, systems, and workflows are affected.
Estimates the scope of required changes: configuration updates, process modifications, training requirements, and system development.
Produces a draft impact report with affected areas ranked by severity and a preliminary remediation plan as an analytical aid for compliance and operations teams.
The final impact assessment and remediation decisions remain with the policy owners and affected teams.
Tracks implementation of approved changes against the remediation plan.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (policy corpus and process documentation), AgentCore Memory (institutional context), DynamoDB (change tracking), S3 (impact reports)

You need this if: Policy changes blindside operational teams because nobody mapped the downstream effects, and your compliance team discovers gaps during audits rather than during the change implementation.

DevOps and Infrastructure Operations

#073 - Deployment Validation Agent

Pattern: New build

Platform: Step Functions + AgentCore (hybrid)

Complexity: Quick Win

Reference Architecture: H

What the agent does:

Runs after each deployment to validate that the release behaves correctly.
Step Functions orchestrates the validation sequence: smoke tests, health checks, metric collection.
The agent handles the judgment calls: comparing post-deployment metrics against pre-deployment baselines and determining whether observed changes are expected (a new feature increases latency on one endpoint but within acceptable bounds) or problematic (error rate increased 0.3% on an unrelated endpoint).
Correlates application metrics with infrastructure metrics to distinguish deployment issues from infrastructure events.
When validation fails, provides a diagnostic summary and recommends an action (roll back, hotfix, or monitor) based on predefined SLO thresholds and deployment policies, not unconstrained model reasoning.
The agent surfaces evidence; deployment policy and on-call make the decision.

AWS services: Step Functions, Bedrock (Claude), AgentCore Runtime, CloudWatch, Lambda (health checks), CodeDeploy, SNS (alerts), DynamoDB (baseline metrics)

You need this if: Your deployment validation is either manual (someone watches dashboards for 30 minutes) or binary (automated tests pass/fail with no nuance), and you have experienced incidents caused by subtle deployment regressions that neither approach caught.

#074 - Cost Anomaly Investigation Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick dashboards)

Complexity: Quick Win Reference

Architecture: G + Quick dashboards

What the agent does:

Responds to AWS Cost Anomaly Detection alerts by investigating the root cause. One timing caveat: Cost Anomaly Detection runs detection models approximately three times per day, and because it relies on Cost Explorer data, detection can lag actual usage by up to 24 hours.
This pattern suits cost governance and investigation workflows, not real-time operational response (use CloudWatch alarms for that). When an alert fires, the agent pulls cost data from Cost Explorer, correlates spikes with CloudTrail events (new resources launched, configuration changes, scaling events), checks deployment logs for recent releases, and examines usage patterns.
Determines whether the cost increase is expected (auto-scaling during a traffic spike), accidental (someone left GPU instances running), or concerning (a compromised credential spinning up resources).
Produces an investigation summary with the likely cause, the ongoing cost impact if unaddressed, and a recommended action.
Quick dashboards track cost trends and anomaly patterns over time.

AWS services: Bedrock (Claude), AgentCore Runtime, AWS Cost Explorer API, AWS Cost Anomaly Detection, CloudTrail, Amazon Quick (QuickSight), EventBridge (anomaly alerts), SNS (notifications)

You need this if: Your team gets cost anomaly alerts, glances at them, and marks them as reviewed without investigating because the investigation requires cross-referencing 4+ data sources manually.

#075 - Certificate and Secret Rotation Agent

Pattern: Migration from RPA

Platform: Step Functions + AgentCore (hybrid)

Complexity: Strategic Bet

Reference Architecture: H

What the agent does:

Adds dependency-aware sequencing and verification on top of the native rotation mechanisms already built into AWS Secrets Manager and ACM.
Step Functions handles the deterministic parts: checking expiration dates, triggering rotation via Secrets Manager or ACM, updating the secret store.
The native rotation handles credential generation.
The agent handles what native rotation does not: determining the correct rotation sequence when multiple services depend on the same credential, validating that each service picked up the new credential (not just that the secret was rotated), and managing the coordination window where old and new credentials must both work.
When rotation fails (a service did not pick up the new credential), the agent diagnoses the failure, rolls back if necessary, and reports the issue with remediation steps.

AWS services: Step Functions, Bedrock (Nova), AgentCore Runtime, AWS Secrets Manager, ACM, Lambda (rotation functions), CloudWatch (health validation), SNS (failure alerts), Systems Manager

You need this if: You have experienced an outage caused by an expired certificate or credential, your rotation process is manual or partially automated with no validation step, and you manage 50+ secrets across your infrastructure.

What These 25 Patterns Reveal

The hybrid architecture dominates this edition. Roughly a third of the patterns use Step Functions for deterministic steps and AgentCore only for judgment calls. Workflow automation has more deterministic logic than customer-facing or employee-facing use cases. The hybrid approach reduces unnecessary model invocations, keeps deterministic steps easier to validate, debug, and audit through Step Functions’ visual execution history, and reserves agent reasoning for the steps where it actually adds value.

Event-driven triggers replace chat interfaces. Only one of these 25 agents has any human interaction during execution (the exception handling agent’s escalation path). The rest trigger from system events, schedules, or data state changes.

This changes the evaluation criteria: latency matters less than reliability, and the agent’s ability to handle edge cases matters more than conversational quality.

RPA migrations cluster in document processing and financial operations.

A few patterns explicitly replace existing RPA scripts. These agents handle the same workflows but add reasoning for the exceptions that break RPA.

The migration path is clear: identify where your RPA scripts have the most manual exception queues, and those are your first agent candidates.

Data quality is the prerequisite, not the afterthought.

Four patterns (reconciliation, master data management, legacy migration, data entry) directly address data quality. Any serious feasibility assessment scores data readiness as the number one factor, and these patterns validate why.

An agent that processes invoices needs clean vendor master data. An agent that orchestrates fulfillment needs accurate inventory counts.

The data quality agents are foundation builds that make every other agent more reliable.

Multi-system coordination is where agents add the most value over traditional automation.

The patterns spanning 4+ systems (fulfillment orchestration, data reconciliation, regulatory filing) handle failure modes that point-to-point integrations struggle with: partial failures, sequencing dependencies, and cascading rollbacks.

Traditional orchestration tools like Step Functions handle retries, catch paths, and compensating logic well for anticipated failure modes.

Agents add value when the failure mode was never explicitly modeled, when the system encounters ambiguous situations that require interpretation rather than a pre-built error path.

What Comes Next

Two more editions to go:

Edition 4 - Data and analytics agents (self-service BI, automated reporting, data pipeline management)
Edition 5 - Compliance, security, and governance agents (high-stakes environments with strict audit and control requirements)

If you are building workflow automation agents, look for processes with four properties: high exception volume that creates manual work, measurable human effort you can benchmark against, structured inputs that give the agent something concrete to work with, and a clear fallback-to-human path when the agent’s confidence is low.

Invoice Processing (#051) and Bank Reconciliation (#068) fit all four criteria, which is why they make strong starting points.

They also use the hybrid Step Functions + AgentCore pattern, which gives you the best cost-to-value ratio for your first process agent.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

25 Internal Knowledge and Productivity Agent Patterns on AWS You Can Steal Right Now

Marcelo Acosta Cavalero — Mon, 06 Apr 2026 13:09:09 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

An engineer spent 40 minutes last Thursday searching for the internal API rate-limiting policy. She checked Confluence, Notion, three Slack channels, and finally asked a colleague who pointed her to a Google Doc shared in a thread six months ago. The policy existed.

Finding it was the problem.

This is the second edition of a five-part series cataloging real AI architecture patterns running on AWS.

Edition 1 covered 25 customer-facing agents.

This edition shifts the lens inward: 25 patterns for employee-facing agents that handle knowledge retrieval, internal support, operational productivity, and the daily friction that slows teams down.

If you missed Edition 1, go back for the “Agent or Not?” scoring framework and the AgentCore vs Quick breakdown.

Those mental models apply here too, so this edition skips straight to the architectures and use cases.

One platform update before the cards: Edition 1 split the world into AgentCore (custom agents) and Quick (analytics).

Internal agents add a third lane. Amazon Q Business is the AWS-native default for enterprise knowledge assistants, permissions-aware search, and SaaS-connected internal help desks.

It ships with native connectors for Google Drive, Slack, Confluence, Jira, SharePoint, and dozens more, with document-level ACLs built in.

Q Business can trigger actions through plugins, but AgentCore remains the better choice when workflows require deterministic orchestration, multi-step execution, or strict policy enforcement.

AgentCore remains the right choice for custom agent backends with tool orchestration, memory, identity, and fine-grained control.

Amazon Quick stays in its lane for analytics, dashboarding, research, and workflow automation around business data.

Several patterns below use Q Business for retrieval and AgentCore for action, which turns out to be the natural split for internal workloads.

Reference Architectures for Internal Agents

Internal agents integrate with different systems than customer-facing ones. Corporate identity providers, internal wikis, HR platforms, CI/CD pipelines, and financial systems replace the CRM and e-commerce APIs from Edition 1.

The four reference architectures adapt accordingly.

Reference Architecture D - Single Agent with Internal Tool Access

Platform: AgentCore

When to use: The agent reasons about which internal tools to query, in what order, based on the employee’s role and question. One agent handles the full interaction with 3-8 internal system integrations.

Covers most IT support, HR advisory, and workflow-execution agents where the agent needs to take actions through APIs.

For pure knowledge retrieval and Q&A, see Architecture D2 below.

AgentCore Identity integrates with your corporate IdP (Okta, Azure AD) for SSO. AgentCore Policy enforces role-based access scoping - verify maturity for your target region before production rollout.

Reference Architecture E - Quick Workspace for Internal Intelligence

Platform: Quick

When to use: Teams need AI-powered analysis of internal data, operational metrics, or workforce analytics without writing code.

Covers engineering velocity dashboards, headcount planning analysis, budget tracking, and self-service reporting for managers and operations teams.

Reference Architecture F - Multi-Agent Internal Workflow

Platform: AgentCore (multi-agent)

When to use: Employee requests span IT, HR, finance, and facilities.

Each domain needs its own tools, knowledge bases, and policy constraints.

A single agent trying to handle all internal functions becomes unreliable at 15+ tools. Specialized agents behind a router keep each context window focused.

Reference Architecture G - Q Business for Enterprise Knowledge

Platform: Amazon Q Business

When to use: The primary need is permissions-aware search and Q&A across SaaS knowledge sources.

Q Business ships with native connectors for dozens of data sources and enforces document-level ACLs automatically.

No custom orchestration code required.

Covers enterprise knowledge search, policy Q&A, and any pattern where the core job is “find the right document and synthesize an answer the employee is authorized to see.”

When the same workflow also needs to take actions (create tickets, provision access, call APIs), pair Q Business for retrieval with AgentCore for execution.

The 25 Use Cases

Knowledge Management and Search

#026 - Enterprise Knowledge Search Agent

Pattern: Modernization from chatbot

Platform: Amazon Q Business (primary), AgentCore (optional action layer) Complexity: Quick Win

Reference Architecture: G

What the agent does:

Searches across internal knowledge sources - Confluence, SharePoint, Google Drive, Slack message history, Jira, and S3 - through a single conversational interface.
Understands natural language questions (”What’s our policy on vendor security reviews?”), retrieves relevant documents from multiple sources, synthesizes a direct answer with citations, and identifies when conflicting information exists across sources.
Respects document-level permissions so employees only see content they have access to. Amazon Q Business handles this natively: its built-in connectors index these sources and its ACL engine maps existing permissions without custom code.
For sources Q Business does not cover natively, Bedrock Knowledge Bases with a custom data source connector fills the gap, though note that some Bedrock connectors (such as Confluence) are in preview and do not yet support multimodal content like tables and diagrams.

AWS services: Amazon Q Business (connectors + retriever + ACL engine), Bedrock Knowledge Bases (custom RAG for unsupported sources), S3 (document store)

You need this if: Your employees regularly say “I know we documented this somewhere” and spend 20+ minutes searching across 3 or more knowledge platforms.

#027 - Policy and Compliance Q&A Agent

Pattern: New build

Platform: Amazon Q Business (primary), AgentCore (for action routing)

Complexity: Quick Win

Reference Architecture: G

What the agent does:

Answers employee questions about internal policies - travel expenses, PTO accrual, data classification, security requirements, procurement thresholds, acceptable use.
Pulls from the authoritative policy documents (not outdated wiki copies) and provides specific answers with page references.
Q Business indexes the policy corpus from S3 or SharePoint and enforces access controls so employees only see policies relevant to their role.
When policies are ambiguous or the question falls outside documented rules, an AgentCore action layer identifies the policy owner and drafts an email for the employee to send.
Tracks which policies generate the most questions, surfacing candidates for clarification.

AWS services: Amazon Q Business (retriever + ACL engine), S3 (policy document store), AgentCore Runtime (action routing), CloudWatch (query analytics)

You need this if: Your HR, legal, or compliance team answers the same policy questions repeatedly, and employees default to asking coworkers instead of reading the docs.

#028 - Institutional Knowledge Capture Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Runs structured knowledge extraction interviews with subject matter experts, particularly before role transitions, departures, or reorganizations.
Asks targeted questions about undocumented processes, tribal knowledge, key relationships, and decision context.
Transcribes and synthesizes responses into structured knowledge articles with proper metadata and cross-references.
Identifies gaps where captured knowledge contradicts or supplements existing documentation.
Generates a handoff document for successors.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (interview state), Amazon Transcribe, S3 (knowledge archive), Bedrock Knowledge Bases

You need this if: Critical knowledge walks out the door when senior employees leave, and your team spends months reconstructing context that lived in someone’s head.

#029 - Technical Documentation Assistant

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Helps engineers navigate internal API documentation, runbooks, architecture decision records, and system diagrams.
Answers questions like “How does the payment service authenticate with the ledger?” by pulling from code comments, README files, ADRs, and internal docs.
When documentation is stale or missing, it flags the gap and creates a draft based on the current codebase.
Understands code context so it can explain what a service does, not just repeat what the docs say.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (documentation + custom-ingested code artifacts), Amazon Q Developer (native repository integration)

You need this if: Your engineering team wastes hours reading outdated documentation and reverse-engineering service behavior because the docs do not match the code.

#030 - Cross-Team Decision Log Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick analytics)

Complexity: Strategic Bet

Reference Architecture: D + E

What the agent does:

Captures architectural decisions, trade-off discussions, and design choices from Slack threads, meeting transcripts, and PR comments.
Structures them into searchable decision records with context, alternatives considered, rationale, and stakeholders.
When a team proposes something that contradicts or revisits a prior decision, the agent surfaces the original discussion and reasoning.
Quick dashboards show decision frequency by domain, open questions, and areas where decisions are overdue.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory, Amazon Quick (QuickSight + Index), Amazon Transcribe, S3 (decision archive)

You need this if: Your teams relitigate the same technical decisions every quarter because nobody remembers why the original choice was made.

IT Help Desk and Internal Support

#031 - IT Help Desk Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Handles common IT support requests through Slack or a web interface.
Resets passwords via the IdP API, provisions software licenses through the asset management system.
Troubleshoots VPN connectivity with diagnostic checks.
Resolves printer issues with guided walkthroughs, and manages MFA token enrollment.
For issues requiring hands-on support, it collects diagnostic information, determines priority based on impact and urgency, and creates a ticket with all relevant context pre-populated.

AWS services: Bedrock (Nova), AgentCore Runtime, AgentCore Identity (IdP integration), AgentCore Gateway (ITSM APIs), ServiceNow API, Okta/Azure AD API

You need this if: More than 50% of your IT help desk tickets are password resets, access requests, and connectivity issues that follow standard resolution procedures.

#032 - Software Access Provisioning Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Processes software access requests end-to-end. Employee asks for access to a tool (GitHub org, AWS account, Datadog, Salesforce).
The agent checks the employee’s role against the entitlement matrix, identifies whether manager approval is needed, routes the approval request, and upon approval, provisions access via the tool’s API or SCIM endpoint.
Handles license availability checks and waitlisting.
Automatically de-provisions access when employees change roles or depart based on HRIS events.

AWS services: Bedrock (Nova), AgentCore Runtime, AgentCore Policy (entitlement rules), AgentCore Identity, SCIM APIs, HRIS API (Workday/BambooHR), EventBridge (lifecycle events)

You need this if: Software access requests take 2+ business days to fulfill because they require manual approval chains and admin intervention across multiple systems.

#033 - Incident Communication Coordinator

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

During production incidents, drafts and distributes internal status updates based on real-time information from monitoring tools and the incident Slack channel.
Pulls metrics from CloudWatch and Datadog, summarizes the current state of the incident, identifies affected services and customer impact, and posts updates to the status page and stakeholder channels at configured intervals.
After resolution, compiles a timeline of events and generates a postmortem draft with contributing factors and action items pre-populated from the incident channel discussion.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (monitoring APIs), CloudWatch, EventBridge, SNS (notifications), S3 (postmortem archive)

You need this if: Your incident commanders spend more time writing status updates than resolving the incident, and postmortems take a week to produce because nobody captured the timeline in real-time.

#034 - Infrastructure Self-Service Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Lets developers request and configure cloud infrastructure through conversation instead of filing tickets.
Handles common requests: spin up a dev environment, create an S3 bucket with standard tagging, set up a new RDS instance within approved configurations, or request a temporary IAM role for cross-account access.
Validates all requests against organizational policies and guardrails (naming conventions, cost limits, security baselines) before executing via IaC templates.
Non-standard requests route to the platform team with a pre-filled request.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (guardrails), AWS Service Catalog, CloudFormation/CDK, IAM, AWS Organizations

You need this if: Your platform team processes 30+ infrastructure requests per week and developers wait 1-3 days for standard environments that could be provisioned in minutes.

#035 - Security Questionnaire Response Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Completes vendor security questionnaires and customer security assessments by matching questions against a maintained library of approved responses.
Pulls from SOC 2 reports, penetration test summaries, architecture documentation, and previously approved answers.
Drafts responses for each question with confidence scores.
High-confidence answers (exact matches to prior approved responses) are auto-filled.
Low-confidence answers are flagged for security team review.
Tracks which questions appear most frequently to prioritize documentation improvements.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (security response library, optionally backed by OpenSearch Serverless for advanced retrieval control), S3 (compliance documents)

You need this if: Your security team spends 10+ hours per week completing repetitive security questionnaires, and the same questions appear across 80% of inbound assessments.

HR and People Operations

#036 - Employee Onboarding Navigator

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Guides new hires through their first 90 days.
Sends day-one setup instructions (laptop configuration, tool access, building entry).
Answers questions about benefits enrollment deadlines, org structure, team norms, and internal processes.
Adapts the onboarding checklist based on role, department, and location.
Tracks completion of required training, compliance acknowledgments, and documentation reviews.
Nudges managers when their new hire’s onboarding milestones are stalling.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (onboarding state), HRIS API (Workday/BambooHR), LMS API, SES/SNS (notifications)

You need this if: New hire ramp time exceeds 30 days, onboarding satisfaction scores are below 80%, and your HR team manually tracks checklist completion in spreadsheets.

#037 - Benefits and Leave Advisory Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Answers employee questions about health insurance plans, 401(k) matching, HSA/FSA eligibility, parental leave, PTO balance, and FMLA procedures.
Pulls real-time data from the HRIS and benefits platforms to give personalized answers (”You have 8.5 PTO days remaining this year”).
Walks employees through benefits enrollment during open enrollment with side-by-side plan comparisons based on their specific situation (family size, expected medical usage, contribution preferences).
Routes complex cases to HR specialists with the question and relevant context pre-attached.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Identity (employee verification), HRIS API, benefits platform API, Bedrock Guardrails (PII handling)

You need this if: Your HR inbox is dominated by benefits questions during open enrollment, and employees make suboptimal plan selections because they do not understand their options.

#038 - Internal Job Matching Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Matches employees to internal open positions based on skills, career goals, project history, and performance data.
Goes beyond keyword matching on job descriptions: analyzes the employee’s actual work (code contributions, project involvement, skills demonstrated in reviews) against what the hiring manager needs.
Surfaces opportunities employees might not have found or considered.
Provides a match explanation (”Your work on the data pipeline migration maps directly to this team’s real-time analytics build”).
Respects confidentiality so managers are not notified unless the employee applies.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (confidentiality rules), HRIS API, ATS API (Greenhouse/Lever), Bedrock Knowledge Bases (job postings + employee profiles)

You need this if: Internal mobility is below 15%, employees leave for roles they could have found internally, and your job board gets low engagement because listings read like external postings.

#039 - Performance Review Preparation Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Helps managers prepare for performance reviews by compiling an employee’s contributions over the review period.
Pulls data from project management tools (Jira tickets completed, PRs merged, epics delivered), peer feedback, 1:1 notes, goal tracking systems, and prior review history.
Generates a structured draft highlighting key accomplishments, growth areas, and evidence for each.
Does not write the evaluation - it assembles the evidence so the manager spends time on assessment quality instead of data gathering.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (data access controls), Jira API, GitHub API, HRIS API, 15Five/Lattice API

You need this if: Your managers spend 3+ hours per direct report gathering data for reviews, and review quality suffers because managers rely on recency bias instead of full-period evidence.

#040 - Compensation Benchmarking Agent

Pattern: New build

Platform: Both (AgentCore backend + Quick analytics)

Complexity: Foundation Build

Reference Architecture: D + E

What the agent does:

Helps HR and hiring managers make compensation decisions by pulling from internal pay bands, market survey data, and peer comparisons.
Takes a role, level, location, and candidate profile, then generates a recommended offer range with supporting data.
Flags when a proposed offer falls outside band or creates internal equity concerns.
Quick dashboards show compensation distribution by team, gender pay gap analysis, and market competitiveness by role family.
All outputs route through HR approval before reaching the hiring manager.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (data access restrictions), Amazon Quick (QuickSight + Research), HRIS API, compensation survey APIs, Redshift

You need this if: Compensation decisions take a week because they require HR to manually pull market data, check internal equity, and build a justification for every offer.

Engineering and Development

#041 - Code Review Context Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Enriches pull requests with context that speeds up code review.
When a PR is opened, it analyzes the changes and adds a summary: which services are affected, what architectural patterns changed, whether the change touches a critical path, and links to related PRs and design docs.
Flags potential issues: breaking API changes, missing test coverage for modified paths, configuration changes that affect other teams, and dependency updates with known vulnerabilities.
Does not approve or block - it surfaces what a reviewer should pay attention to.

AWS services: Bedrock (Claude), AgentCore Runtime, GitHub/GitLab API, Bedrock Knowledge Bases (architecture docs + ADRs), Amazon Q Developer (code review context)

You need this if: Code reviews take 2+ days because reviewers spend most of their time understanding context rather than evaluating the actual change.

#042 - Incident Postmortem Generator

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Produces structured postmortem documents from incident data.
Pulls the timeline from PagerDuty or Opsgenie, reconstructs the sequence of events from the incident Slack channel, correlates with deployment logs and monitoring data, and generates a draft postmortem following your team’s template.
Identifies contributing factors by analyzing what changed before the incident (deploys, config changes, traffic spikes).
Pre-populates action items based on patterns from previous incidents.
The on-call engineer reviews and refines instead of writing from scratch.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (PagerDuty/Opsgenie API, Slack API), CloudWatch Logs, S3 (postmortem archive)

You need this if: Postmortems take a week to produce, half of incidents never get a written postmortem, and your team keeps encountering the same failure modes.

#043 - Dependency Risk Assessment Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Continuously monitors your codebase’s dependency tree for risk signals beyond CVEs.
Analyzes maintainer activity (abandoned projects, single-maintainer risk), license compatibility, breaking change frequency in upstream releases, and supply chain indicators (typosquatting packages, unexpected maintainer changes).
When a dependency update is available, provides a risk assessment: what changed, what might break, and whether similar codebases have reported issues.
Prioritizes updates based on actual exposure, not just severity scores.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (GitHub API, package registry APIs), Amazon Inspector (vulnerability scanning + SCA), Amazon Q Developer (code-level risk context), EventBridge (scheduled scans)

You need this if: Your dependency updates are either ignored for months (creating security debt) or applied blindly (causing unexpected breakages), and Dependabot alerts alone do not give you enough context to prioritize.

#044 - On-Call Handoff Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Generates end-of-rotation handoff briefs for on-call engineers.
Compiles all incidents from the rotation (alerts fired, pages received, resolutions applied), ongoing issues that need monitoring, recent deployments that might cause problems, and upcoming changes the next on-call should watch.
Pulls from PagerDuty, Slack incident channels, deployment logs, and the change calendar.
The outgoing on-call reviews and annotates the brief before it goes to the incoming engineer.

AWS services: Bedrock (Claude), AgentCore Runtime, PagerDuty API, Slack API, deployment pipeline API, SES (handoff delivery)

You need this if: On-call handoffs happen verbally (or not at all), incoming engineers start blind, and the first hour of every rotation is spent asking “what happened this week?”

#045 - Architecture Decision Record Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Facilitates the creation of Architecture Decision Records from design discussions.
Monitors designated Slack channels and meeting transcripts for architectural debates.
When it detects a decision being made, it drafts an ADR: context, decision, alternatives considered, consequences, and status.
Tags the relevant teams and stakeholders for review.
Maintains a searchable index of all ADRs linked to the services they affect.
When someone proposes a change that conflicts with an existing ADR, the agent surfaces the relevant record and asks whether this is an intentional reversal.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory, Slack API, Amazon Transcribe, Bedrock Knowledge Bases (ADR corpus), S3

You need this if: Your team makes architectural decisions in Slack threads that nobody can find three months later, and new engineers re-propose approaches that were already evaluated and rejected.

Finance and Procurement

#046 - Expense Report Processing Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Processes expense reports by extracting data from uploaded receipts using Amazon Textract, matching expenses against the company’s travel and expense policy, flagging out-of-policy items with specific policy references, and routing compliant reports for manager approval.
Handles currency conversion for international expenses, per diem calculations by city, and mileage reimbursement.
Auto-categorizes expenses for GL coding.
Reports with flagged items go to the submitter for correction before reaching the approval queue.

AWS services: Bedrock (Nova), AgentCore Runtime, Amazon Textract, AgentCore Policy (expense rules), expense management API (Concur/Expensify), DynamoDB

You need this if: Your finance team manually reviews expense reports for policy compliance, processing takes 5+ business days, and 30% of submissions require back-and-forth corrections.

#047 - Procurement Request Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: D

What the agent does:

Guides employees through procurement requests conversationally.
Collects requirements (what they need, why, budget, timeline), checks whether an existing contract covers the request, identifies the correct approval chain based on amount and category, and generates a purchase requisition.
For software purchases, checks the approved vendor list and existing license inventory to avoid redundant buying.
Handles the approval workflow: routes to the right approvers, sends reminders, escalates stalled approvals, and notifies the requester at each stage.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (approval rules + spend limits), ERP API (SAP/Oracle/NetSuite), contract management API, SES (notifications)

You need this if: Employees avoid the procurement process because it requires filling out forms they do not understand, and your procurement team spends hours routing requests to the right approvers.

#048 - Budget Tracking and Forecast Agent

Pattern: New build Platform: Both (AgentCore backend + Quick dashboards) Complexity: Strategic Bet

Reference Architecture: D + E

What the agent does:

Monitors department budgets against actuals in real-time.
Pulls spend data from the ERP, cloud billing (AWS Cost Explorer), and SaaS management platforms. Alerts budget owners when spending trends suggest they will exceed budget before quarter end.
Generates variance explanations by analyzing which line items are over or under plan.
Quick dashboards let managers drill into spend by category, vendor, and project.
Produces monthly budget summaries and forecast adjustments automatically.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (QuickSight + Flows), AWS Cost Explorer API, ERP API, Redshift, EventBridge (alerting triggers), SNS

You need this if: Budget reviews happen monthly from stale spreadsheets, overspend is discovered after the fact, and finance produces variance reports manually.

Meetings and Communication

#049 - Meeting Summarization and Action Tracker

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Joins meetings (via Amazon Chime SDK or calendar integration), transcribes the discussion, and produces a structured summary within minutes of the meeting ending.
Identifies decisions made, action items with owners and due dates, open questions, and topics deferred.
Posts the summary to the relevant Slack channel or project management tool.
Tracks action items across meetings and flags overdue items in the next meeting’s pre-brief.
Distinguishes between informational discussion and actionable outcomes.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Transcribe, Amazon Chime SDK (the SDK remains supported independently of the Chime service), Slack API, Jira API (action item creation), S3 (transcript archive)

You need this if: Action items from meetings disappear into notes nobody reads, decisions get relitigated because they were not recorded, and your team spends 5+ hours per week in meetings without clear outcomes.

#050 - Status Report Generator

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: D

What the agent does:

Compiles weekly or biweekly status reports by pulling from the systems where work actually happens.
Aggregates Jira ticket progress, GitHub PR activity, deployment history, incident reports, and OKR tracking data.
Produces a structured update for each team: what shipped, what is in progress, what is blocked, and key metrics.
Managers review and edit instead of writing from scratch.
Adapts format and detail level based on the audience (team standup vs executive briefing vs cross-functional update).

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway (Jira, GitHub, OKR platform APIs), S3 (report archive), SES (distribution)

You need this if: Your managers spend 2+ hours per week writing status reports by manually checking Jira, GitHub, and Slack, and the reports are outdated by the time they are sent.

What These 25 Patterns Reveal

Different dynamics emerge when agents face inward instead of outward.

Knowledge retrieval dominates the Quick Win category.

Most of them involve finding, synthesizing, or delivering information that already exists somewhere in the organization.

The hardest part of internal AI agents is not the reasoning - it is the integration with fragmented knowledge sources behind SSO, document-level permissions, and inconsistent APIs.

Amazon Q Business absorbs a significant chunk of this complexity out of the box with native connectors and built-in ACLs, which is why it appears as the default for pure retrieval patterns.

Bedrock Knowledge Bases fills in when you need a custom RAG pipeline or when Q Business lacks a connector for your source.

Permission models are the real engineering challenge.

Customer-facing agents from Edition 1 mostly deal with one customer’s data at a time.

Internal agents cross organizational boundaries constantly.

An HR agent that can see compensation data, a finance agent that reads budget forecasts, an engineering agent that accesses production logs - each needs fine-grained access controls scoped to the requester’s role.

AgentCore Identity handles IdP integration for SSO. AgentCore Policy adds rule-based access scoping - verify maturity for your target region before production rollout.

For retrieval-only patterns, Q Business’s ACL engine is the more battle-tested option today.

RPA migrations have the clearest ROI.

Expense processing, access provisioning, procurement workflows - these agents replace brittle RPA scripts that break when a UI changes.

The agentic version handles exceptions, asks clarifying questions, and adapts to edge cases instead of failing silently.

Multi-agent architectures appear less often internally (see how we did not reference architecture F).

Internal users tolerate slightly longer response times and are better at framing specific questions, which means a single well-tooled agent handles most internal scenarios effectively.

Quick fills the analytics gap. Some patterns use Quick for dashboarding and self-service analysis.

Internal teams need visibility into operational data more than they need conversational agents.

QuickSight and Quick Research provide that without custom development.

Where the Leverage Actually Is

Most of the patterns in this edition run on a single agent with tool access. That’s not a limitation of the framework, it reflects how internal work actually breaks down. Employees ask specific questions, need specific actions, and want specific answers. The architectural complexity lives in the permission model and the integration layer, not in multi-agent orchestration.

The engineer from the opening spent 40 minutes finding a rate-limiting policy. Pattern #026 solves that with Q Business, native connectors, and document-level ACLs she never has to think about.

No custom orchestration.

No agent memory.

No specialist routing.

The right document, surfaced to someone authorized to see it, in seconds. Start there.

Add AgentCore when the workflow needs to take action, not just answer questions. Add Quick when teams need dashboards, not conversations.

Every pattern in this edition follows that same decision sequence: retrieval first, action second, analytics where the data justifies it.

What Comes Next

Three more editions:

Edition 3 - Workflow automation and process agents (internal operations, no direct user interaction)
Edition 4 - Data and analytics agents (self-service BI)
Edition 5 - Compliance, security, and governance agents (high-stakes environments)

If you are building internal productivity agents, start with #026 (Enterprise Knowledge Search) or #031 (IT Help Desk).

Enterprise Knowledge Search deploys fast on Q Business with minimal custom code.

IT Help Desk needs AgentCore for the action layer but has the clearest success metrics.

Both solve a pain point every employee recognizes on day one.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

Stop Designing AI Agents From Scratch. Steal These 25 Patterns Instead.

Marcelo Acosta Cavalero — Thu, 02 Apr 2026 13:53:55 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

A customer support team deployed a Bedrock-powered chatbot last quarter.

It answered questions from a knowledge base, handled basic FAQs, and saved about 15 hours per week.

Thanks for reading Build With AWS! Subscribe for free to receive new posts and support my work.

Solid win.

Then someone asked:

“Can it also check order status, issue refunds, and escalate to the right team based on sentiment?”

That question marks the exact boundary between a GenAI feature and an AI agent.

This is the first edition of a five-part series cataloging real AI architecture patterns running on AWS right now.

Each edition covers 20-25 use cases with enough detail to evaluate whether they fit your organization: what the agent does, which services power it, and a reference architecture you can adapt.

Patterns you can take to your next architecture review, not slides about the future of AI.

But first, two quick mental models so the cards land with full context.

Agent or Not? Five Questions

Every AI project starts with someone saying “we should build an agent for that.”

Most of the time, a well-configured prompt with RAG handles the job.

The distinction matters because agents cost more to build, run, and debug.

How predictable is the workflow?

Same steps, same order, every time?

A Lambda function with a Bedrock call handles it.

Agents earn their keep when each request requires different steps based on context.

A refund request that needs to check inventory, verify purchase history, calculate partial credit, and decide whether to escalate, all conditionally, that is agent territory.

Does it require multi-step reasoning?

Single-turn Q&A works fine as a RAG pipeline.

When the system needs to analyze options, weigh trade-offs, decide, and then act on that decision across multiple systems, you need agentic reasoning.

Does it need tool access?

Reading from a knowledge base and generating text is retrieval-augmented generation.

Calling APIs, writing to databases, triggering workflows, interacting with external systems, that requires an agent’s orchestration layer.

Does it interact conversationally?

Multi-turn dialogue with context retention, clarifying questions, and adaptive responses points toward agentic design.

Form-style inputs do not.

Does it need to improve over time?

Static systems return the same quality output indefinitely.

Agents that learn from feedback and adapt to new scenarios justify the additional infrastructure.

Score each 1-5.

Below 10? Standard GenAI.

Between 10 and 18? Evaluate whether basic GenAI plus automation gets you 80% of the value at 30% of the cost.

Above 18? Build the agent.

AgentCore vs Quick in 30 Seconds

AgentCore is the developer platform. Modular services (Runtime, Gateway, Memory, Identity, Policy, Observability) you compose into custom architectures.

You write code, pick your framework (LangGraph, CrewAI, Strands), and control everything.

Best for custom agent logic, multi-agent orchestration, VPC-internal integrations, and fine-grained security scoping.

Amazon Quick is the business user platform.

Five pre-built products:

Quick Sight (visualization)
Quick Flows (workflow automation)
Quick Automate (process automation)
Quick Index (enterprise search)
Quick Research (deep analysis).

Best for data analysis, report generation, document search, and SaaS integrations where speed to deployment beats architectural control.

Some patterns in this series use both.

An AgentCore agent handles backend orchestration while Quick provides the analytics layer.

They complement each other.

How to Read the Cards

Every use case follows this structure:

Pattern - Where this agent comes from in your org: net-new capability, upgrade from an existing chatbot, or replacement for an RPA workflow. This describes the migration path, not the audience - all 25 patterns in this edition are customer-facing
Platform - AgentCore, Quick, or both
Complexity - Quick Win (weeks, high confidence), Strategic Bet (months, higher value), or Foundation Build (prerequisites needed first)
Reference Architecture - Points to one of the three diagrams below
What the agent does - The actual workflow, triggers, systems, decisions
AWS services - Specific services involved
You need this if - One signal that this use case applies to you

Reference Architecture A - Single Agent with Tool Access

Platform: AgentCore

When to use: The agent reasons about which tools to call, in what order, based on customer context. One agent handles the full conversation with 3-8 tools.

Covers most customer service, sales, and account management agents.

The Gateway handles tool discovery, authentication, and rate limiting.

The Runtime manages session state.

Bedrock provides the foundation models used for reasoning and generation.

Reference Architecture B - Quick Workspace for Customer Intelligence

Platform: Quick

When to use: Internal teams need AI-powered analysis of customer data, behavior patterns, or support metrics without writing code.

Covers customer analytics, churn prediction dashboards, support quality monitoring, and self-service reporting for customer success teams.

Reference Architecture C - Multi-Agent Customer Workflow

Platform: AgentCore (multi-agent)

When to use: Different customer intents need different tools, knowledge bases, and reasoning patterns.

A single agent with 20+ tools becomes unreliable.

Specialized agents with a router perform better.

The 25 Use Cases

Customer Support and Service

#001 - Intelligent Ticket Resolution Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Receives incoming support tickets via API or chat widget.
Pulls customer history from CRM, checks recent orders and transactions, searches the knowledge base for relevant solutions, and either resolves the ticket directly or drafts a response for human review.
Handles password resets, order status checks, return initiations, and FAQ-level questions autonomously.
Escalates to human agents when confidence drops below a configured threshold.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, OpenSearch Serverless (knowledge base), Bedrock Guardrails

You need this if: Your support team spends more than 40% of their time on repetitive tickets that follow predictable resolution patterns.

#002 - Multi-Channel Support Orchestrator

Pattern: New build

Platform: AgentCore (multi-agent)

Complexity: Strategic Bet

Reference Architecture: C

What the agent does:

A router agent receives customer messages from chat, email, voice (transcribed via Amazon Transcribe), and social channels.
It classifies intent, detects sentiment, pulls conversation history from AgentCore Memory, and routes to a specialist agent.
The billing agent handles payment disputes and invoice questions.
The technical agent troubleshoots product issues with access to diagnostic APIs.
The account agent manages subscription changes.
Each specialist has its own tool set and knowledge base, keeping context windows focused and tool selection reliable.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory, AgentCore Gateway, Amazon Transcribe, Amazon Connect, EventBridge

You need this if: You support customers across 3+ channels and your agents need different tools for billing, technical, and account questions.

#003 - Proactive Customer Health Monitor

Pattern: New build

Platform: Both (AgentCore backend + Quick analytics)

Complexity: Strategic Bet Reference

Architecture: A + B

What the agent does:

Runs on a schedule (daily or triggered by events).
Analyzes customer usage patterns, support ticket frequency, NPS scores, and billing data.
Identifies accounts showing early churn signals: declining usage, increasing ticket volume, missed payments, or negative sentiment trends.
Generates a risk score and recommended intervention for each flagged account. Customer success managers review the output through Quick Sight dashboards and receive alerts via SNS.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick Sight, EventBridge (scheduler), Redshift (customer data warehouse), SNS

You need this if: Your customer success team manages 100+ accounts and discovers churn risk reactively, after the customer complains or leaves.

#004 - Returns and Refund Processing Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Handles return requests end-to-end.
Verifies purchase eligibility against return policy rules, checks inventory status for exchanges, calculates refund amounts (including partial refunds, restocking fees, and promotional adjustments), initiates the refund through the payment gateway API, generates return shipping labels, and sends confirmation to the customer.
For edge cases outside policy parameters, it drafts a recommendation and routes to a human supervisor.

AWS services: Bedrock (Nova), AgentCore Runtime, AgentCore Gateway, AgentCore Policy (action restrictions), payment gateway API, shipping API

You need this if: Returns processing involves manual lookups across 3+ systems and takes your team more than 10 minutes per request on average.

#005 - Warranty Claims Adjudication Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: A

What the agent does:

Receives warranty claims with product photos, purchase receipts, and damage descriptions.
Uses multimodal Bedrock models to analyze product images and assess damage.
Cross-references the serial number against the warranty database for coverage verification.
Applies claim rules (coverage period, damage type, prior claims history) and either approves, denies, or flags for manual review.
Approved claims trigger replacement shipment or repair scheduling through the fulfillment system.

AWS services: Bedrock (Claude with vision), AgentCore Runtime, Amazon S3 (document/image storage), DynamoDB (warranty database), Step Functions (fulfillment orchestration)

You need this if: Your warranty claims process involves manual image review, policy lookup across multiple systems, and takes 24-48 hours for straightforward claims.

Sales and Revenue

#006 - Lead Qualification and Routing Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Engages inbound leads from web forms, chat widgets, or landing pages.
Asks qualifying questions conversationally (budget range, timeline, company size, use case).
Enriches lead data by calling company information APIs.
Scores the lead against ICP criteria and routes qualified leads to the appropriate sales rep based on territory, deal size, and product interest.
Unqualified leads receive automated nurture sequences.
All interactions sync back to the CRM.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, CRM API (Salesforce/HubSpot), company enrichment API (Clearbit/ZoomInfo)

You need this if: Your SDR team spends more than half their day qualifying leads that turn out to be poor fits, and qualified leads wait hours for first response.

#007 - Personalized Product Recommendation Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Interacts with customers browsing your product catalog.
Asks about preferences, use cases, and constraints through natural conversation.
Queries the product database with semantic search, filters by availability and pricing, and recommends products with specific reasons tied to what the customer described.
Handles comparison requests across multiple products.
Tracks which recommendations led to add-to-cart events and feeds that data into downstream analytics or recommendation systems.

AWS services: Bedrock (Claude), AgentCore Runtime, OpenSearch Serverless (product catalog with vector search), Amazon Personalize, DynamoDB (interaction history)

You need this if: Your product catalog has 500+ SKUs and customers abandon the site because they cannot find what matches their specific needs.

#008 - Quote Generation and Pricing Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Takes customer requirements (quantity, specifications, timeline, delivery location) and generates formal price quotes.
Pulls current pricing from the ERP system, applies volume discounts, checks contract-specific pricing for existing customers, calculates shipping costs based on logistics APIs, and factors in promotional offers.
Generates a formatted PDF quote and sends it to the customer.
Non-standard requests outside pricing rules route to the sales manager with a recommended price and margin analysis.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Gateway, ERP API (SAP/Oracle), Lambda (PDF generation), SES (email delivery)

You need this if: Generating a custom quote takes your sales team 2+ hours and involves pulling data from 3 or more systems manually.

#009 - Contract Renewal Intelligence Agent

Pattern: New build

Platform: Both

Complexity: Strategic Bet

Reference Architecture: A + B

What the agent does:

Monitors contract expiration dates across the customer base.
Sixty days before renewal, it compiles a customer health profile: usage trends, support ticket history, feature adoption, billing history, and NPS scores.
Generates a renewal risk assessment and recommended pricing strategy (upsell opportunity, standard renewal, at-risk discount needed).
Sales reps review renewal briefs through Quick dashboards.
The agent drafts personalized renewal communications based on each account’s specific usage patterns and value received.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Quick (Sight + Research), Redshift, EventBridge (scheduler), SES

You need this if: Your renewal process starts too late, relies on generic outreach, and your team lacks a consolidated view of customer health at renewal time.

#010 - Real-Time Sales Coaching Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: A

What the agent does:

Listens to live sales calls via Amazon Connect integration and Amazon Transcribe streaming.
Analyzes the conversation in real-time and provides the sales rep with contextual prompts: competitor objection responses, relevant case studies, pricing flexibility guidelines, and technical specifications.
After the call, generates a summary, identifies follow-up actions, updates the CRM, and scores the call against your sales methodology framework.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Connect, Amazon Transcribe (streaming), Bedrock Knowledge Bases, CRM API

You need this if: Your sales team handles complex technical sales where having the right information during the call directly impacts close rates.

Onboarding and Activation

#011 - Customer Onboarding Orchestrator

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Guides new customers through product setup step by step.
Adapts the onboarding flow based on the customer’s plan tier, industry, and stated goals.
Configures initial settings, imports data from previous tools via API, creates sample content or workflows, and schedules check-in milestones.
Tracks completion of onboarding tasks and sends reminders for incomplete steps.
Escalates to a customer success manager when the customer gets stuck or expresses frustration.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (onboarding state), product API, SES/SNS (notifications)

You need this if: Your time-to-value exceeds 14 days for new customers and onboarding completion rate sits below 70%.

#012 - Document Collection and Verification Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Manages the document collection process for customer applications (financial services, insurance, healthcare enrollment).
Sends document requests, receives uploads, uses Amazon Textract to extract information, validates extracted data against application requirements, flags discrepancies, and requests corrections or additional documents.
Maintains a real-time status dashboard showing which documents are complete, pending, or require resubmission.

AWS services: Bedrock (Claude), AgentCore Runtime, Amazon Textract, Amazon S3, DynamoDB (application state), SES (communications)

You need this if: Your customer application process requires 5+ documents, average collection time exceeds 2 weeks, and your team spends hours chasing missing or incorrect paperwork.

#013 - KYC and Identity Verification Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: A

What the agent does:

Conducts Know Your Customer verification for financial services onboarding.
Collects identity documents, extracts data with Textract, verifies against government databases and sanctions lists via API, performs facial comparison between ID photos and selfies using Amazon Rekognition, runs PEP (Politically Exposed Persons) screening, and generates a risk assessment score.
Clean cases auto-approve.
Flagged cases route to compliance analysts with a detailed findings summary.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (compliance rules), Amazon Textract, Amazon Rekognition, third-party verification APIs, DynamoDB

You need this if: Your KYC process takes 3+ business days for standard applications and your compliance team manually reviews documents that could be auto-verified.

#014 - Insurance Quoting and Binding Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Walks prospective customers through an insurance application conversationally.
Collects required information (property details, driving history, health information depending on product line) through natural dialogue instead of static forms.
Calls underwriting APIs to generate real-time premium quotes.
Explains coverage options, deductible trade-offs, and exclusions in plain language.
Standard-risk profiles complete binding and get policy documents.
Non-standard risks route to an underwriter with the completed application and preliminary risk assessment.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Guardrails (PII handling), underwriting API, document generation API, payment gateway

You need this if: Your online quote-to-bind conversion rate is below 15% and customers abandon applications because the process is too long or confusing.

#015 - Patient Intake and Pre-Visit Agent

Pattern: New build

Platform: AgentCore

Complexity: Foundation Build

Reference Architecture: A

What the agent does:

Contacts patients before scheduled appointments to collect intake information.
Gathers medical history updates, current medications, symptoms, and insurance details through a conversational interface.
Verifies insurance eligibility in real-time via payer APIs.
Pre-populates the EHR with collected information so the provider has context before the visit.
Sends appointment reminders and preparation instructions (fasting requirements, documents to bring).
Handles rescheduling requests by checking provider availability.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Policy (HIPAA compliance), Bedrock Guardrails (PHI protection), EHR API (Epic/Cerner), insurance verification API

You need this if: Your front desk staff spends 15+ minutes per patient on intake paperwork and your no-show rate exceeds 10%.

Self-Service and Account Management

#016 - Account Configuration Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Handles account management requests that currently require support tickets or phone calls.
Changes billing information, updates contact details, modifies subscription plans, adds or removes users, adjusts notification preferences, and manages API keys.
Validates changes against account policies before executing.
Requires additional verification (MFA challenge) for sensitive changes like payment method updates or admin role assignments.

AWS services: Bedrock (Nova), AgentCore Runtime, AgentCore Identity (user verification), AgentCore Policy (change authorization), account management API

You need this if: More than 30% of your support tickets are account modification requests that follow standard procedures and require no human judgment.

#017 - Billing Dispute Resolution Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Investigates billing disputes by pulling invoice details, payment history, usage records, and contract terms.
Identifies the root cause: duplicate charge, incorrect rate, usage miscalculation, or failed payment.
For clear-cut errors, applies the credit automatically and confirms with the customer.
For ambiguous disputes, presents findings with supporting data and offers resolution options.
Complex disputes involving contract interpretation route to the billing team with a complete investigation summary.

AWS services: Bedrock (Claude), AgentCore Runtime, billing system API, payment processor API, DynamoDB (dispute tracking)

You need this if: Billing disputes take your team an average of 45+ minutes to investigate because the relevant data lives in 4 or more systems.

#018 - Subscription Optimization Advisor

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Analyzes a customer’s actual usage patterns against their current subscription tier.
Identifies underused features the customer is paying for, features they need but lack access to, and usage trends suggesting a plan change would save money or deliver better value.
Proactively reaches out (or responds when asked) with a specific recommendation backed by the customer’s own data.
Handles plan changes directly when the customer agrees.

AWS services: Bedrock (Claude), AgentCore Runtime, usage analytics API, billing API, SES (proactive outreach)

You need this if: Customers churn because they feel they are overpaying, or they hit plan limits and leave instead of upgrading because no one showed them the value of the next tier.

Scheduling and Coordination

#019 - Appointment Scheduling Agent

Pattern: Modernization from chatbot

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Manages appointment booking across multiple providers, locations, and service types.
Understands natural language requests (”I need to see Dr. Martinez next Tuesday afternoon”), checks real-time availability, considers travel time between locations for the customer, handles rescheduling and cancellations, sends confirmations and reminders.
Manages waitlists and automatically offers cancellation slots to waiting customers.

AWS services: Bedrock (Nova), AgentCore Runtime, scheduling system API, SNS/SES (notifications), DynamoDB (waitlist management)

You need this if: Your scheduling staff handles 100+ calls per day for appointment booking and phone wait times exceed 5 minutes during peak hours.

#020 - Service Dispatch and Coordination Agent

Pattern: Migration from RPA

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Coordinates field service appointments for installation, repair, or maintenance visits.
Collects service request details from the customer, determines required skills and equipment, checks technician availability and location, proposes appointment windows, and confirms bookings.
On the day of service, provides the customer with technician ETA updates.
If a technician runs late or a job takes longer than expected, automatically reschedules downstream appointments and notifies affected customers.

AWS services: Bedrock (Claude), AgentCore Runtime, workforce management API, Amazon Location Service, SNS (real-time notifications), EventBridge (event-driven rescheduling)

You need this if: Your dispatch team manually coordinates 50+ service appointments per day and customers complain about missed windows or lack of status updates.

Communication and Engagement

#021 - Personalized Outreach Campaign Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Generates personalized outreach messages at scale for marketing campaigns, re-engagement sequences, and lifecycle communications.
For each customer segment, it pulls behavioral data (purchase history, browsing patterns, feature usage, support interactions), generates message variants tailored to individual contexts, and A/B tests subject lines and content.
Feeds performance data (open rates, click-throughs, conversions) into analytics workflows that inform future campaign generation.
Operates within brand guidelines and approved messaging frameworks stored in the knowledge base.

AWS services: Bedrock (Claude), AgentCore Runtime, Bedrock Knowledge Bases (brand guidelines), Amazon Pinpoint, Redshift (customer data), S3 (campaign assets)

You need this if: Your marketing team sends the same campaign to entire segments and personalization is limited to inserting the customer’s first name.

#022 - Review Response and Reputation Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Monitors customer reviews across platforms (Google, Yelp, Trustpilot, app stores, social media).
Analyzes sentiment and topic.
For positive reviews, drafts personalized thank-you responses.
For negative reviews, investigates the customer’s account to understand context, drafts empathetic responses that address specific complaints, and creates internal tickets for service recovery.
Aggregates review trends into weekly reports highlighting recurring issues and sentiment shifts.

AWS services: Bedrock (Claude), AgentCore Runtime, review platform APIs, CRM API, Amazon Comprehend (sentiment analysis), SNS (alerts for critical reviews)

You need this if: Your team responds to reviews manually, response times exceed 24 hours, and you lack a systematic way to track sentiment trends across platforms.

#023 - Multilingual Customer Communication Agent

Pattern: New build

Platform: AgentCore

Complexity: Quick Win

Reference Architecture: A

What the agent does:

Handles customer interactions in 20+ languages without requiring multilingual staff.
Detects the customer’s language from their first message, conducts the entire conversation in that language, and translates internal knowledge base content on the fly.
Maintains cultural context and idiomatic accuracy beyond literal translation.
For regulated communications (financial disclosures, healthcare instructions), uses pre-approved translations from the knowledge base instead of real-time generation.

AWS services: Bedrock (Claude, which handles multilingual natively), AgentCore Runtime, Bedrock Knowledge Bases (approved translations), Amazon Translate (fallback for less common languages)

You need this if: You serve customers in 3+ languages and currently either hire language-specific support staff or use basic translation tools that miss nuance.

Specialized Industry Agents

#024 - Real Estate Property Matching Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: A

What the agent does:

Works with home buyers through a conversational interface to understand requirements beyond basic filters.
Captures lifestyle preferences (commute tolerance, school district priorities, neighborhood vibe, proximity to amenities) alongside traditional criteria (bedrooms, budget, location).
Searches MLS listings with semantic matching, scores properties against the buyer’s full preference profile, and presents curated shortlists with specific reasons each property matches.
Schedules viewings, provides neighborhood data, and adapts recommendations based on feedback after each showing.

AWS services: Bedrock (Claude), AgentCore Runtime, AgentCore Memory (buyer preference evolution), MLS API, Amazon Location Service, OpenSearch Serverless (semantic property search)

You need this if: Your agents show 15+ properties before a buyer makes an offer and clients say “you’re not understanding what I want” after the third showing.

#025 - Travel Itinerary Planning Agent

Pattern: New build

Platform: AgentCore

Complexity: Strategic Bet

Reference Architecture: C

What the agent does:

Builds complete travel itineraries through multi-turn conversation.
A planner agent understands preferences and constraints (dates, budget, interests, mobility needs, dietary restrictions).
A booking agent searches flights, hotels, and activities through GDS and supplier APIs.
A logistics agent optimizes the sequence of activities based on geography, operating hours, and travel time.
The planner presents the consolidated itinerary, handles modifications, and manages booking confirmations.
Post-booking, it monitors for flight changes, weather disruptions, and sends trip updates.

AWS services: Bedrock (Claude), AgentCore Runtime (multi-agent), AgentCore Memory (trip state), GDS/supplier APIs via AgentCore Gateway, Amazon Location Service, EventBridge (monitoring triggers)

You need this if: Your customers spend 3+ hours planning trips across multiple booking sites and your conversion rate from search to booking sits below 5%.

What These 25 Patterns Reveal

A few things stand out across all 25.

Most customer-facing agents start as chatbot upgrades.

The jump from “answers questions from a knowledge base” to “takes actions on behalf of the customer” is where real value appears.

If you already have a chatbot, you have the knowledge base and the channel.

Adding tool access through AgentCore Gateway converts that chatbot into an agent.

Quick Wins cluster around support and account management.

These use cases have well-defined rules, predictable workflows, and clear success metrics.

They make good first agents because the scope is contained and ROI is measurable within weeks.

Multi-agent architectures show up only when necessary.

Only a couple of the 25 patterns require multiple coordinated agents.

Most customer-facing work is handled well by a single agent with the right tools.

Build multi-agent systems because a single agent’s context window or tool set has become unreliable, not because the architecture sounds impressive.

The foundation model matters less than the integration layer.

Swapping Claude for Nova changes your cost profile but rarely changes the architecture.

The APIs, knowledge bases, and policy rules are where the real engineering happens.

What Comes Next

This series continues with four more editions, same format, different domains:

Edition 2 - Internal knowledge and productivity agents (employee-facing)
Edition 3 - Workflow automation and process agents (internal operations, no direct customer interaction)
Edition 4 - Data and analytics agents (self-service BI)
Edition 5 - Compliance, security, and governance agents (high-stakes environments)

Bookmark this. When someone on your team says “we should use AI for X,” pull up the relevant card and walk into the architecture discussion with a starting point instead of a blank whiteboard.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

Anthropic accidentally leaked Claude Code's source code. Here's what that means.

Marcelo Acosta Cavalero — Tue, 31 Mar 2026 21:51:47 +0000

Last week, someone noticed that version 2.1.88 of the Claude Code npm package was 60MB heavier than it should have been.
Inside: reconstructable source code for Claude Code's CLI. Around 512,000 lines of TypeScript across nearly 2,000 files. Significant portions of the agent codebase that Anthropic had kept private, exposed by a single build mistake.

How does a mistake like this even happen?

When developers ship software, they often minify the code first. That means compressing it into an unreadable blob of abbreviated variable names and stripped formatting. The goal is smaller files, faster downloads, and some protection from competitors reading your work.
To debug that minified code, teams use source maps: files that translate the ugly compressed version back into the original readable code.
These are internal tools.
They should never ship to users.
This one did.

What was actually inside?

Reported findings include:

How Claude Code's agent loop works
Multi-agent coordination logic
Around 44 feature flags for unshipped functionality
System prompts Claude Code uses internally
How persistent memory is implemented

What was confirmed not inside: model weights, training data, backend infrastructure, or safety pipelines.
The AI is fine.
This was the client-side scaffolding around it.

Wasn't Claude Code already open source?

Anthropic has a public GitHub repo for Claude Code and a Claude Agent SDK that developers can use to build their own tools. So there's always been some public surface area.
But the actual application has always shipped as an obfuscated bundle.
You could install it and run it.
You could not read how it worked.

So what should you actually pay attention to?

The feature flags are the most interesting part. Hidden functionality sitting behind conditionals tells you a lot about what Anthropic is building next. People are already mapping those out.
Anthropic confirmed this was human error, not a security breach, and no customer data was exposed. If you're building on Claude Code or evaluating agentic AI tools, this is a rare look at how a production-grade AI agent is actually architected. The code is already mirrored across GitHub.
It's not going anywhere.

A Serverless Blueprint for Multimodal Video Search on AWS

Marcelo Acosta Cavalero — Thu, 26 Mar 2026 12:00:00 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

This design was inspired by Miguel Otero Pedrido and Alex Razvant’s “Kubrick” course, but rebuilt using native AWS primitives instead of custom frameworks.

Video is impossible to search.

You can scrub through it manually, or rely on YouTube’s auto-generated captions that only match exact keywords.

But what if you want to find “the outdoor mountain scene” or “where they discuss AI ethics”?

Traditional video platforms fail here because they treat video as a single data type.

This system treats video as three parallel search problems.

Speech gets transcribed with word-level timestamps and indexed for semantic search.
Every frame generates a semantic description through Claude Vision and goes into a separate index.
Those same frames become 1,024-dimensional vectors for visual similarity search.

Users ask questions in natural language, and an intelligent agent figures out which index to query. Results come back with exact timestamps.

The architecture runs entirely on serverless AWS: AgentCore Gateway for tool orchestration, Bedrock Knowledge Bases for RAG, S3 Vectors for image search, and Lambda tying everything together.

Processing cost is front-loaded (heavy on first upload), but once videos are indexed, the system runs for roughly $3 per month per 100 videos. Query latency stays under 2 seconds.

The Three-Index Architecture

Most video systems treat search as a single problem: match keywords in titles or auto-generated captions. That works if users know exactly what they’re looking for and can describe it with the exact words spoken in the video.

It breaks down when someone asks “show me outdoor mountain scenes” or wants to find visually similar shots.

The solution is to treat video as three separate, parallel search problems.

-First, transcribe the audio track completely and index every spoken word with word-level timestamps.

This handles “what was said” queries.

-Second, extract frames throughout the video, generate semantic descriptions using Claude Vision, and index those descriptions.

This handles “what was shown” queries.

-Third, create vector embeddings of those same frames using Titan Multimodal and store them in S3 Vectors for visual similarity search.

Each index serves a different user intent.

The speech index answers “find where they discuss machine learning.”

The caption index answers “show me celebration scenes.”

The image index answers “find shots that look like this” when users upload a reference image.

Users don’t need to know which index exists. An intelligent agent analyzes their query, determines which tool to invoke, executes the search, and returns results with exact timestamps.

System Architecture

The frontend is a single-page app hosted on S3 and delivered via CloudFront. Users upload videos through a presigned URL directly to S3, which triggers the processing pipeline. Searches go through API Gateway to the agent Lambda, which either invokes tools directly (Manual Mode) or asks Claude Sonnet to analyze intent and select the right tool (Auto Mode). Tools are exposed via AgentCore Gateway using the Model Context Protocol.

Video Processing Pipeline

When a user uploads a video, the orchestrator Lambda kicks off two parallel tracks: frame extraction and transcription.

The frame track extracts frames using FFmpeg, sends them to Claude Vision for semantic descriptions, and creates vector embeddings for similarity search.

The transcription track uses AWS Transcribe to generate word-level timestamps, then chunks and indexes the transcript for semantic search.

Both complete in roughly 5-6 minutes for a 2-minute video.

Frame extraction doesn’t use a fixed frame rate like 6fps or 1fps. Instead, it extracts a fixed number of frames evenly distributed across the video duration. A 30-second clip gets 45-120 frames. A 10-minute video also gets 45-120 frames. This matters because caption generation costs scale with frame count, not video length.

Timestamps are calculated using (frame_number - 1) × duration / (total_frames - 1) to ensure frames are spread evenly from start to finish, with the first frame at 0 seconds and the last frame at the video’s end.

FFmpeg runs inside a Lambda function with 2GB of memory and a 10-minute timeout. For videos longer than 10 minutes, the system would need Fargate or Step Functions to handle the extended processing time. But the processing logic stays the same, just a different execution environment.

Transcription happens in parallel via AWS Transcribe. The service processes the audio track asynchronously and typically finishes in about 1/4 of the video duration. A 10-minute video transcribes in roughly 2.5 minutes. A polling Lambda checks the job status with a 5-second delay between attempts (up to 60 attempts max, allowing roughly 5 minutes of polling).

Transcribe returns word-level timestamps in JSON format: each word gets a start time, end time, and confidence score. Punctuation appears as separate items without timing. This granularity is critical because when Bedrock Knowledge Base returns a text snippet later, we need to map that snippet back to exact timestamps in the original video.

The chunk_transcript Lambda processes the Transcribe output into 10-second audio chunks, each preserving the original word-level timestamps. Each chunk becomes a separate JSON file (chunk_0001.json, chunk_0002.json, etc.) containing the chunk text, precise start_time_sec and end_time_sec boundaries, and metadata.

This pre-chunking ensures that search results can be mapped back to exact video positions while maintaining semantic coherence within each searchable segment.

Documents are stored at {video_id}/speech_index/ and {video_id}/caption_index/ within the processed bucket. Caption data follows a similar pattern, with one JSON file per frame containing the Claude Vision-generated description, frame number, and timestamp.

Bedrock Knowledge Base has a limitation: it doesn’t support wildcards in S3 inclusion prefixes. You cannot configure it to scan */speech_index/ across multiple video folders. The deployed Bedrock Knowledge Bases are configured to work with the current bucket structure. The chunk_transcript and embed_captions Lambdas trigger KB ingestion jobs after uploading new documents, ensuring search indexes stay synchronized with processed content. Bedrock KB generates embeddings for each document, enabling semantic search while preserving the timestamp metadata attached to each chunk.

The current implementation prioritizes organizing all video-related data under a single video_id prefix for easier management and deletion. An alternative architecture would place the index type at the top level (speech_index/{video_id}/...) allowing a single KB inclusion prefix to scan all videos, but would sacrifice per-video organizational simplicity.

Caption generation is where processing costs concentrate. Each frame goes to Claude 3.5 Sonnet via Bedrock with a prompt that asks for 2-3 sentence descriptions focusing on subjects, actions, setting, and atmosphere. Claude returns natural language like “A chef in a white uniform demonstrates knife skills in a modern kitchen, dicing vegetables while explaining technique to a camera.” Each caption saves as a JSON file with the description, frame metadata, and timestamp.

At roughly $0.005-0.008 per frame, a video with 100 frames costs $0.50-0.80 to caption. That’s 5-8x more expensive than Amazon Rekognition, which would return structured labels like “Person” (93% confidence), “Kitchen” (89%), “Knife” (85%). The cost premium buys search quality. When users ask “show me cooking demonstrations,” Claude’s semantic descriptions match the intent. Rekognition’s labels don’t connect to natural language queries the same way. For a system built around conversational search, Claude’s cost is justified.

The same frames that get captions also become vector embeddings. Titan Multimodal Embeddings generates 1,024-dimensional vectors at $0.00006 per frame, essentially free compared to caption costs. These vectors go into S3 Vectors, a serverless vector store that handles indexing and similarity search without infrastructure management. Each vector record includes the embedding wrapped in a float32 format plus metadata for video ID, frame number, and timestamp. This enables “find similar shots” queries where users upload a reference image and get back visually similar frames.

Search and Retrieval

The three indexes sit behind Bedrock Knowledge Bases (for speech and captions) and S3 Vectors (for images). AgentCore Gateway exposes six tools via the Model Context Protocol: search_by_speech, search_by_caption, search_by_image, list_videos, get_video_metadata, and get_full_transcript. The agent Lambda invokes these tools either directly when users pick Manual Mode, or through Claude’s analysis in Auto Mode.

The Knowledge Base has a timestamp problem. When it returns a text snippet from a transcript, it doesn’t include the original timestamps from the Transcribe JSON.

The snippet is just text. But users need “go to 2:34 in the video,” not “this text appears somewhere in there.”

The solution is having Claude match the snippet back to the word-level timeline. The agent downloads the Transcribe JSON, extracts all words with their start and end times, and asks Claude to find which words semantically match the returned snippet. Claude returns {”start_time”: 154.2, “end_time”: 157.8}. This adds about 500ms to query latency, but the precision is worth it.

The Knowledge Base might paraphrase “we’re exploring artificial intelligence” while the original transcript says “we are exploring AI,” and Claude maps them correctly anyway.

Intelligent Routing

The agent Lambda receives user queries and decides which tool to invoke. In Manual Mode, users explicitly pick speech, caption, or image search, and the agent calls that tool directly. In Auto Mode, users just type natural language, and Claude Sonnet 4 figures out the intent.

A query like “find where they discuss machine learning” goes to speech search. “Show me outdoor mountain scenes” goes to caption search. “Find similar shots” triggers image search.

Claude gets a system prompt explaining what each tool does: search_by_speech queries transcripts, search_by_caption queries frame descriptions, search_by_image handles visual similarity. Claude analyzes the user’s question and returns structured JSON with the tool name, parameters, and reasoning. The agent then invokes that tool via AgentCore Gateway using SigV4-signed requests. Results come back with video IDs, timestamps, matched text, and confidence scores, all formatted for the frontend to display.

This design skips Bedrock Agents entirely. Bedrock Agents handle orchestration automatically, but that comes with limited control over error handling, no support for custom timestamp extraction logic, and extra cost for features this system doesn’t need. Building the agent from scratch using Claude’s tool-use API gives full control over the routing logic, parallel tool execution, and response formatting.

AgentCore Gateway sits between the agent and the tools, hosting an MCP (Model Context Protocol) server that exposes the six search and utility tools. Each tool backs to a Lambda function, and the Gateway handles SigV4 authentication, tool discovery, and request routing. When the agent invokes search_by_speech, the Gateway routes that to the speech search Lambda, waits for results, and returns them. Adding new tools means registering them in the Gateway configuration. No agent code changes required.

Design Trade-Offs

The three-index architecture trades infrastructure complexity for search quality.

A single Knowledge Base containing transcripts, captions, and image data would be simpler to manage. But speech needs dense text with context windows. Captions need short, precise matching. Images need vector similarity, not text search. Separate indexes let each modality optimize independently, and the search quality difference is measurable. Users asking “show me outdoor scenes” get relevant results from the caption index that a combined index would miss.

Claude Vision costs 5-8x more than Rekognition per frame. For 100 frames, that’s $0.50-0.80 versus $0.10. The cost premium comes from Claude generating full semantic descriptions while Rekognition returns structured labels with confidence scores. When users search with natural language like “cooking demonstrations,” Claude’s narrative captions match their intent. Rekognition’s labels (”Person”, “Kitchen”, “Utensil”) don’t connect to conversational queries the same way.

The system prioritizes search experience over processing cost because users abandon systems that don’t find what they’re looking for.

S3 Vectors handles vector storage without managing clusters or configuring indexes. Query latency runs 200-300ms, which is acceptable for this use case.

OpenSearch Serverless would deliver sub-100ms queries and support hybrid keyword+vector search, but it adds complexity and cost that the system doesn’t need yet. The switch point is around 10k videos or when query latency becomes the primary user complaint. Below that threshold, S3 Vectors is simpler and cheaper.

Lambda handles all processing because video workflows are bursty. A system might process 10 videos in an hour, then sit idle for three hours. Fargate would cost roughly $30 per month per service even when doing nothing. Lambda costs $0 when idle.

The breaking point is continuous processing at 100+ videos per hour, where Fargate’s flat rate becomes cheaper than Lambda’s per-execution pricing. Most video systems never hit that threshold.

Frame extraction uses a fixed frame count (45-120 frames) evenly distributed across video duration rather than a fixed frame rate. This decision controls caption costs: 100 frames to caption regardless of whether the video runs 30 seconds or 10 minutes. A 6fps approach would generate 1800 frames for a 5-minute video and 600 frames for a 100-second video, wildly different costs. Fixed frame count makes processing costs predictable and avoids redundant captions when adjacent frames look nearly identical.

Cost Analysis

Processing 100 two-minute videos costs roughly $58 up front, then $3 per month to keep running. With a fixed frame count of 80 frames per video (middle of the 45-120 range), the math is straightforward: 100 videos × 80 frames = 8,000 frames total. Claude Vision at $0.006 per frame comes to $48. AWS Transcribe adds $4.80 for speech transcription (200 minutes at $0.024 per minute). Titan image embeddings cost $0.48 for those same 8,000 frames. Lambda invocations are negligible at $0.10.

Storage runs about $0.40 per month. Frames take up roughly 5-8GB (80 frames × 50-100KB average per JPEG × 100 videos) at $0.12-0.18. S3 Vectors hold 120-150MB of embeddings (8,000 vectors × 15-20KB each including metadata) for $0.003-0.004. Transcripts take about 20MB at $0.0005. Bedrock Knowledge Base vectors store in S3, already counted in the frame storage cost. The dominant cost is always frame storage.

Queries cost $0.27 per thousand. Bedrock Knowledge Base retrieval is $0.10, Claude Sonnet 4 for routing is $0.15, and S3 Vectors queries are $0.02. API Gateway and Lambda execution costs are minimal enough to ignore at this scale. A system running 10,000 queries per month pays $2.70 in query costs.

The cost structure is front-loaded. Month 1 with 100 new videos: $61 (processing + storage + queries). Month 2 with no new uploads: $3.10 (storage + queries). Month 3: $3.10. The system essentially costs $3 per month to operate once videos are processed, with spikes when new content arrives.

Frame count directly controls caption costs. Using 120 frames per video instead of 80 increases caption costs from $48 to $72 per 100 videos. Using 45 frames drops it to $27. Bedrock Batch Inference offers a 50% discount on Claude pricing but delays results by 24 hours, acceptable for async workflows. Combining lower frame counts (45-60) with batch inference brings processing costs down to $15-20 per 100 videos.

Performance and Scaling

A 2-minute video takes 5-6 minutes to become fully searchable. Frame extraction completes in 10-15 seconds. Transcription runs asynchronously and finishes in about 30 seconds. Caption generation is the bottleneck at 3-5 minutes, processing 100 frames at 2-3 seconds each. Image embedding adds another 20-30 seconds. Batch inference trades processing speed for cost savings: results take 24 hours instead of 5 minutes, but cut costs in half.

Query latency stays under 2 seconds for speech and caption search. Speech queries run 800-1200ms: Bedrock Knowledge Base retrieves matching snippets in 400-600ms, then Claude extracts precise timestamps from the Transcribe JSON in another 400-500ms.

Caption queries run faster at 600-900ms since frame timestamps come directly from metadata. Image similarity search is fastest at 300-500ms, just a vector query against S3 Vectors. The agent routing overhead (Claude analyzing intent and selecting tools) adds 400-600ms in Auto Mode.

Production Considerations

When Lambda crashes during caption generation, AWS automatically retries async invocations twice by default. The generate_captions Lambda catches individual frame failures and continues processing remaining frames rather than halting the entire batch.

The process_video and extract_frames Lambdas update DynamoDB status to ‘error’ on failure, but caption generation failures are logged to CloudWatch without explicit DynamoDB status tracking.

Partial results persist in S3 - frames remain available even if caption generation crashes afterward. There’s no automatic recovery mechanism, so resuming a failed step requires manually re-invoking the specific Lambda function with the video_id parameter, which reprocesses that entire step rather than resuming from the failure point.

Search query failures depend on a cascade of timeouts. The agent_api Lambda has a 60-second timeout, though internal Bedrock requests use a 30-second timeout. API Gateway enforces a 29-second maximum integration timeout, which would typically trigger first and return a timeout error to the user.

Query performance depends heavily on result count and metadata filtering - requesting 50 results from a large Knowledge Base performs worse than requesting 5 with specific video_id filters.

Knowledge Base synchronization happens programmatically, not on a schedule. After uploading transcripts or captions to S3, the chunk_transcript and embed_captions Lambdas explicitly trigger ingestion jobs via bedrock_agent.start_ingestion_job(). This ensures new content becomes searchable without waiting for automatic syncs.

The code logs indicate ingestion typically completes in around 2 minutes, though actual time varies with document count and KB size.

The architecture scales from 100 to 1,000 videos without structural changes. Storage costs scale linearly with video count - 10x the videos means 10x the S3 storage costs. Query latency depends more on index size and query complexity than sheer video count, since Bedrock KB and S3 Vectors both use vector indexes that grow with content volume. Lambda concurrency rarely becomes an issue because video processing happens asynchronously over time rather than simultaneously.

At 10,000+ videos, you’d monitor specific bottlenecks as they emerge.

Bedrock Knowledge Base query latency could increase as vector indexes grow larger. S3 Vectors performance might degrade with hundreds of thousands or millions of frame vectors.

The list_videos DynamoDB scan would slow down, requiring pagination and potentially a Global Secondary Index on upload_timestamp for efficient retrieval.

These are optimization problems, not architectural redesigns - the core processing logic stays the same while execution environments might shift from Lambda to Fargate for longer videos, or from S3 Vectors to OpenSearch Serverless for consistently sub-100ms vector queries at scale.

Deployment and Production Readiness

The infrastructure deploys through AWS CDK with a single command: cdk deploy --all. This creates two stacks - InfrastructureStack with 19 Lambda functions, 2 Bedrock Knowledge Bases, 4 S3 buckets, a DynamoDB table, and API Gateway, plus FrontendStack with CloudFront distribution and frontend bucket. The Bedrock Knowledge Bases and AgentCore Gateway are pre-configured in AWS rather than created by the CDK deployment. The entire stack is version-controlled and reproducible across environments.

All Lambda functions log to CloudWatch with 731 days (2 years) of retention. The deployment includes no CloudWatch alarms, SNS topics, or automated monitoring by default - production deployments would need to add metric filters for processing duration, query latency, and failure rates. The CloudWatch logs capture every Lambda invocation but require manual querying or external tooling for insights beyond basic log inspection.

Native AWS services handle complex multimodal AI workloads without custom frameworks or infrastructure. AgentCore Gateway provides MCP standardization for tool orchestration. Bedrock Knowledge Bases manage retrieval-augmented generation across speech and caption indexes. S3 Vectors store image embeddings. Lambda processes videos and routes queries. The system runs at a low monthly cost after initial video processing, with predictable scaling characteristics up to 10,000 videos.

The three-index architecture is a practical solution to a real problem. Users can’t find specific moments in video content using traditional keyword search. This system lets them ask natural language questions and get back exact timestamps, whether they’re searching for spoken content, visual scenes, or similar-looking shots.

The design prioritizes search quality over processing cost because users abandon systems that don’t find what they’re looking for.

The architecture scales from prototype to production without rewrites.

Start with 100 videos on Lambda and S3 Vectors. Grow to 1,000 videos without changes. Push to 10,000 videos with monitoring and metadata filters.

Beyond that, swap Lambda for Fargate, S3 Vectors for OpenSearch, and add ElastiCache. The core logic stays the same.

What’s next?

Challenge the Blueprint: Share your advanced use case or propose an upgrade in the comments.

You can find a detailed account of how each part is built, the criteria for the options chosen, and other details in the project’s repo. Feel free to contribute or open any issues you find.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

Agent Memory Strategies: Building Believable AI with Bedrock AgentCore

Marcelo Acosta Cavalero — Wed, 25 Mar 2026 10:42:30 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

Your agent answers a question about project deadlines by retrieving every meeting from the past six months.

The response is technically accurate but completely useless, burying the critical deadline mentioned yesterday beneath dozens of irrelevant status updates from March.

You see this in a lot of agents unless you design retrieval on purpose.

The agent remembered everything but understood nothing about what actually mattered in that moment.

The Stanford research team that created “Generative Agents” encountered this exact problem while building 25 simulated characters for a virtual town environment.

Their agents could store thousands of observations, but when asked what to do next, they retrieved memories randomly based on simple keyword matching.

This produced bizarre behavior loops where agents repeated the same action multiple times in a row because their memory system couldn’t distinguish “I just did this five minutes ago” from “I generally do this around lunchtime.”

Smarter memory retrieval based on three scoring dimensions solved this problem: recency (when did this happen), importance (how much did this matter), and relevance (does this relate to my current situation).

Amazon Bedrock AgentCore now provides the infrastructure to implement these memory strategies at enterprise scale.

But understanding why these mechanisms matter and how to configure them effectively requires examining the research that proved their necessity.

The Memory Retrieval Problem: Why Raw Storage Fails

Language models can process vast context windows, but that capability creates a dangerous illusion.

Organizations assume that giving agents access to complete conversation history and knowledge bases will produce intelligent behavior. In practice, it doesn’t work that way.

Consider an agent helping with customer support.

The customer mentions a billing issue from three months ago, asks about a current feature request, and wants to schedule a call.

The agent’s memory contains thousands of interactions with this customer across multiple categories: billing problems, feature requests, scheduling conflicts, casual chitchat about industry events.

Without retrieval scoring, the agent treats all memories as equally relevant.

The context window fills with whatever was stored most recently or whatever matches basic keyword searches.

The agent might retrieve detailed notes about the customer’s preferences for coffee (mentioned casually last week) while missing the critical billing escalation pattern that requires immediate attention.

The Stanford Generative Agents research demonstrated this failure mode systematically.

When Klaus Mueller, one of their simulated characters, was asked to recommend someone to spend time with, the version without proper memory retrieval chose Wolfgang simply because Wolfgang’s name appeared frequently in recent observations. The character had never had a meaningful conversation with Wolfgang.

They just lived in the same dorm and passed each other constantly.

With memory retrieval scoring, Klaus chose Maria Lopez, someone he’d actually collaborated with on research projects.

The memories of those substantive interactions scored higher across multiple dimensions despite being less frequent than the Wolfgang encounters.

This distinction matters enormously for enterprise agents. The difference between retrieving memories based on recency alone versus scoring across multiple dimensions determines whether agents exhibit genuine understanding or just pattern match on whatever happened most recently.

Recency Scoring: Time-Aware Memory Access

Recency scoring implements a simple but crucial insight: recent experiences should influence behavior more than distant ones, but the decay shouldn’t be linear.

An interaction from 10 minutes ago remains highly relevant. An interaction from 10 months ago might still matter for specific contexts but shouldn’t dominate general decision-making.

The Stanford team implemented recency through exponential decay functions.

Each memory receives a recency score that decreases over time at a rate determined by the decay factor.

In their implementation, they used a decay factor of 0.995 per time unit (their simulation used hourly intervals), creating a smooth gradient where very recent memories score highest but older memories remain accessible when other factors (importance, relevance) elevate them.

This approach elegantly solves the “everything is equally important” problem without requiring manual categorization.

When an agent plans an event, memories of yesterday’s specific preparations score significantly higher than memories of general operations from last week.

Both memories exist, but recency scoring ensures the contextually appropriate one influences current planning.

For enterprise agents, recency scoring prevents a common failure mode: over-reliance on initial training or setup information that’s no longer current.

A customer service agent needs to prioritize the customer’s statement from 30 seconds ago over background information from the knowledge base, unless other factors indicate the background information carries unusual importance.

Implementation requires three technical decisions.

First, selecting the decay function shape.

Exponential decay works well for most agent applications because it creates gentle transitions rather than harsh cutoffs.

Second, choosing the decay rate.

Faster decay means stronger recency bias, slower decay preserves long-term context.

Third, defining time units relevant to your agent’s operation.

Hours work for customer service, days for project management, seconds for real-time monitoring.

Amazon Bedrock AgentCore handles recency implicitly through its extraction and consolidation strategies rather than exposing explicit decay functions.
New information is incorporated into long-term memory through consolidation, while older or superseded information becomes less likely to surface during retrieval.

This behavior creates the appearance of recency, but AgentCore does not model time as a scoring factor. Recent information dominates only because it remains in the active session, not because it is weighted higher during retrieval.

Importance Scoring: Distinguishing Mundane from Critical

Not all experiences carry equal significance.

An agent that treats “scheduled regular status meeting” and “critical security incident reported” as equivalent memories will make catastrophic decisions.

Importance scoring solves this by assigning weights that reflect the significance of each experience.

The Stanford research revealed an elegant solution to importance assessment: simply ask the language model.

Rather than building complex heuristic systems, they prompted the model with a straightforward question: “On a scale of 1 to 10, where 1 is purely mundane (e.g., brushing teeth, making bed) and 10 is extremely poignant (e.g., a break up, college acceptance), rate the likely poignancy of the following piece of memory.”

This approach works remarkably well because language models have learned implicit importance hierarchies from their training data.

“Cleaning up the room” consistently scores around 2.

“Asking your crush out on a date” scores around 8.

The model doesn’t need explicit rules about importance. It already understands the relative significance of human experiences.

For enterprise agents, importance scoring prevents memory streams from becoming cluttered with routine operational noise.

Consider an agent monitoring infrastructure health.

The system generates thousands of observations per hour: service health checks passing, routine log rotations, scheduled backups completing.

These observations need to exist for completeness, but they shouldn’t dominate memory retrieval when the agent needs to explain why it escalated a particular issue.

An anomaly in error rates, however, should score significantly higher in importance.

When the agent later retrieves memories to explain its decision to wake up the on-call engineer at 2 AM, it should prioritize the error rate anomaly over the 500 successful health checks that happened around the same time.

Implementing importance scoring requires addressing a subtle challenge: importance is somewhat subjective and context-dependent.

What’s important for customer service agents differs from what’s important for financial analysis agents.

The Stanford team used a general-purpose importance prompt, but enterprise applications benefit from domain-specific calibration.

Bedrock AgentCore’s built-in memory strategies implicitly capture importance through LLM-driven extraction and consolidation, rather than exposing an explicit importance scoring mechanism.

When using the built-in strategies with customization, you can guide what the system considers important by adding domain-specific instructions via the appendToPrompt configuration field.

For example, you might append “Focus on precedent-setting cases and landmark decisions” for a legal research agent, or “Prioritize executive contacts and decision-maker interactions” for a sales agent.

The key architectural decision is when to calculate importance scores.

The Stanford approach computed importance at memory creation time, which works well for most applications.

The alternative (computing importance dynamically based on current context) offers more flexibility but increases computational overhead.

For enterprise agents handling high-volume interactions, calculating importance once at storage time provides better cost/performance characteristics.

Relevance Scoring: Context-Aware Memory Matching

Recency tells us when something happened.

Importance tells us how much it mattered.

Relevance tells us whether it matters right now for the current situation.

Without relevance scoring, agents retrieve memories that are recent and important but completely unrelated to the current task.

The Stanford team implemented relevance through embedding similarity.

Each memory gets encoded as a vector representation capturing its semantic content.

When the agent needs to retrieve memories, it generates an embedding for the current query and calculates cosine similarity with all stored memories.

Memories semantically related to the current context score higher regardless of how recently they occurred or their absolute importance.

This approach enabled emergent behavior that felt genuinely intelligent.

When agents engaged in domain-specific conversations (like political discussions), they retrieved memories about previous related conversations and relevant domain knowledge, not just whatever they’d been thinking about recently.

The relevance scoring ensured contextually appropriate memories surfaced even if they weren’t the most recent or most important in absolute terms.

For enterprise applications, relevance scoring transforms agents from mechanical responders to context-aware assistants.
A project management agent asked about budget status needs to retrieve financial memories, not schedule memories, even if scheduling happens more frequently or involves more important stakeholders.

The query context (”budget status”) should drive retrieval, not just temporal proximity or general importance.

Implementation requires solving the embedding problem: how do you generate semantic representations that accurately capture the meaning of agent experiences?

The Stanford team leveraged language model embeddings, which provide reasonable semantic similarity out of the box.

Enterprise applications have three main options.

First, use general-purpose embeddings from foundation models like those available through Bedrock.

These work well for most agent interactions but may miss domain-specific semantic relationships.

Second, fine-tune embeddings on your specific domain to capture industry jargon and specialized concepts.

This improves relevance scoring accuracy but requires investment in training data and model development.

Third, use hybrid approaches that combine general embeddings with domain-specific metadata to enhance relevance without full fine-tuning.

Bedrock AgentCore Memory uses semantic search with vector embeddings automatically. The built-in strategies handle embedding generation and similarity calculation without requiring manual configuration.

When using built-in strategies with customization, you can select a different foundation model via the modelId configuration field if your domain benefits from a model with specialized training.

For complete control over embedding strategies, you can implement self-managed memory strategies with custom embedding models.

One critical implementation detail: relevance scoring requires formulating the right query.

When an agent searches its memory, what query should generate the relevance embeddings?

The Stanford approach used the agent’s current situation or question as the query.

For enterprise agents, you might construct queries from multiple sources: the user’s current message, the agent’s current task, recent conversation context, or even the agent’s own reflection on what information it needs.

Combining Scores: The Retrieval Function Architecture

Individual scoring dimensions solve specific problems, but agent behavior emerges from how scores combine.

The Stanford team’s retrieval function weighted three dimensions equally: retrieval_score = recency + importance + relevance, with each dimension normalized to [0,1] range using min-max scaling.

This equally-weighted approach works surprisingly well as a starting point because each dimension captures fundamentally different information.

Recency prevents over-reliance on old context.

Importance prevents mundane noise from dominating.

Relevance ensures contextual appropriateness.

Together, they create a retrieval function that balances multiple concerns without requiring manual tuning.

However, enterprise applications often benefit from adjusted weighting based on agent type and use case.

A real-time monitoring agent might weight recency more heavily.

What happened in the last five minutes matters more than what happened yesterday, regardless of importance or relevance.

A research agent might weight relevance more heavily. Finding semantically related information matters more than when it was discovered or how important it seemed at the time.

The math is the easy part:

retrieval_score = w_recency × recency_score + w_importance × importance_score + w_relevance × relevance_score, where the weights sum to 1.0.

The challenge lies in determining appropriate weights for your specific application.

Different agent types benefit from different weight profiles.

Conversational agents heavily favor recent context since conversation flow depends on immediate history.

Knowledge agents strongly favor relevance since finding the right information matters more than when it was learned.

Alert agents heavily favor recency and importance since recent critical events drive alerting decisions.

AgentCore’s built-in strategies handle these tradeoffs automatically through their consolidation algorithms rather than exposing explicit weight parameters.

If you need fine-grained control over how recency, importance, and relevance combine in retrieval scoring, you would implement self-managed memory strategies with custom retrieval logic.

Reflection: Synthesizing Memory Into Understanding

Raw observations form the foundation of agent memory, but believable behavior requires higher-level understanding.

The Stanford team introduced “reflection” as a mechanism for agents to periodically synthesize observations into broader insights about themselves, others, and their environment.

Reflection generates a second type of memory that coexists with observations in the memory stream.

These reflective memories don’t capture specific events.

Instead, they capture patterns, relationships, and understanding derived from multiple events.

When an agent reflects on observations about spending significant time on research activities and interactions with other researchers, it might generate the insight:

“This agent is highly dedicated to research work.”

This reflection itself becomes a memory that can be retrieved alongside observations.

The power of reflection emerges when agents need to make decisions requiring synthesis.

Without reflection, an agent’s decision about who to collaborate with depends on raw observation frequency.

A colleague appears in more memories simply due to physical proximity (shared office space, common areas).

With reflection, the agent retrieves synthesized understanding about shared professional interests, even though substantive interactions with that person appear less frequently than casual proximity encounters.

For enterprise agents, reflection prevents a common failure mode: drowning in detail while missing the big picture.

A customer service agent might observe 50 interactions with a particular customer across various issues: billing questions, technical problems, feature requests.

Without reflection, the agent treats each interaction as independent.

With reflection, the agent synthesizes: “This customer experiences recurring billing confusion despite multiple explanations, suggesting the billing interface itself may be unclear.”

The Stanford implementation triggered reflection periodically based on experience accumulation.

When the sum of importance scores for recent observations exceeded a threshold, the agent reflected.

This approach ensures reflection happens when agents have sufficient new experiences to warrant synthesis while avoiding constant reflection on minor observations.

The threshold value determines reflection frequency: lower thresholds mean more frequent reflection (which can generate noise), higher thresholds mean agents accumulate more experiences before synthesizing (which requires sufficient important events to cross the threshold).

Reflection generation involves three steps.

First, identify salient questions based on recent experiences.

The agent prompts itself: “Given these recent observations, what are the most important questions I can answer about myself or my environment?”

Second, retrieve relevant memories for each question.

Third, synthesize insights that answer those questions, citing specific observations as supporting evidence.

Bedrock AgentCore implements reflection through its Episodic Memory Strategy.

Episodic memory operates on a per-session basis, with reflections synthesized from episodes within the same interaction context rather than across arbitrary sessions.

This strategy captures interactions as structured episodes with intents, actions, and outcomes, then generates cross-episode reflections that synthesize broader insights.

The episodic strategy uses namespaces to organize both individual episodes and the reflections derived from them.

When using built-in strategies with customization, you can guide reflection behavior through the appendToPrompt configuration field to focus synthesis on patterns relevant to your domain.

For example, you might append instructions like “When reflecting, focus on recurring customer pain points and opportunities for process improvement.”

The built-in episodic strategy handles reflection timing automatically based on accumulated experiences.

For complete control over reflection triggers, frequency, and synthesis logic, you would implement a self-managed memory strategy with custom algorithms.

Reflection also enables recursion: agents can reflect on their own reflections.

An agent might observe multiple experiences around a specific work pattern, reflect on that pattern, then later reflect on multiple patterns together to synthesize higher-level understanding.

This hierarchical reflection creates increasingly abstract understanding that guides high-level decision-making.

How AgentCore Actually Implements Memory

Amazon Bedrock AgentCore takes a different architectural approach than the Stanford research paper.

Rather than manually scoring memories across recency-importance-relevance dimensions, AgentCore provides two complementary memory types that automate much of this complexity:

AgentCore’s Two-Tier Memory System

Short-term memory stores raw interactions within a single session as events. Each event captures conversational exchanges, instructions, or structured information such as product details or order status.
Events persist for a configurable retention period and can be retrieved later within the same actor and session scope, enabling controlled continuation of context without merging unrelated sessions.

You can attach metadata to events for quick filtering without scanning full session history.

Long-term memory automatically extracts and stores structured insights from interactions.

After events are created, AgentCore asynchronously processes them to extract facts, preferences, knowledge, and session summaries.

These consolidated insights persist across multiple sessions and enable personalization without requiring customers to repeat information.

Semantic Search vs. Retrieval Scoring

AgentCore’s RetrieveMemoryRecords operation performs semantic search to find memories most relevant to the current query.

This differs from the Stanford approach where you explicitly configure recency, importance, and relevance weights.

AgentCore handles relevance through embeddings automatically, while recency and importance are implicit in how it processes and consolidates long-term memories.

Episodic Memory for Learning

AgentCore Memory includes an episodic memory strategy, enabling agents to learn and adapt from experiences over time.

This builds knowledge that makes interactions more humanlike, similar to the reflection mechanisms described in the Stanford research.

Configuring Memory Strategies in AgentCore

AgentCore provides built-in memory strategies that handle extraction, consolidation, and retrieval automatically.

Understanding how to configure these strategies helps you build agents with effective memory behavior without implementing Stanford-style scoring from scratch.

Built-in Memory Strategies

AgentCore provides four built-in strategies that automatically extract and organize different types of information from agent interactions:

User Preference Strategy: Automatically identifies and extracts user preferences, choices, and styles. Useful for e-commerce agents that need to remember customer preferences like favorite brands, sizes, or shopping habits.

Semantic Memory Strategy: Extracts key factual information and contextual knowledge using vector embeddings for similarity-based retrieval. Prevents agents from repeatedly asking for information users already provided.

Summary Memory Strategy: Creates condensed summaries of conversations within a session, reducing the need to process entire conversation histories for context.

Episodic Memory Strategy: Captures interactions as structured episodes with intents, actions, and outcomes. Includes cross-episode reflection capabilities that synthesize broader insights across multiple interactions.

Customizing Built-in Strategies

AgentCore allows two levels of customization for built-in strategies:

Prompt Customization: Use the appendToPrompt configuration field to add domain-specific instructions that guide what the strategy extracts and how it prioritizes information. For example, a legal research agent might add instructions to focus on precedent-setting cases and landmark decisions, while prioritizing regulatory changes and compliance requirements.

Model Selection: Choose a different foundation model via the modelId field if your domain benefits from specialized model capabilities.

Memory Retrieval and Filtering

When retrieving memories, AgentCore uses semantic search with vector embeddings to find the most relevant information. You can control retrieval behavior through several parameters:

Namespace filtering: Organize memories hierarchically using namespace patterns like /users/{actorId}/preferences or /support_cases/{sessionId}/facts, then filter retrieval to specific namespaces.

Top-k limiting: Specify how many memory records to retrieve (balancing context richness against processing costs).

Event retention: Configure how long raw conversation events persist (up to 365 days) before automatic expiration.

Implementing Stanford-Style Explicit Scoring

If you need explicit control over recency-importance-relevance weighting like the Stanford approach, you can implement self-managed memory strategies.

Self-managed strategies give you complete control over:

Custom extraction and consolidation algorithms
Manual scoring across any dimensions you define
Integration with external memory systems
Custom retrieval logic with explicit weight configuration

Self-managed strategies require infrastructure setup (S3 buckets for payloads, SNS topics for notifications, IAM roles for access) and ongoing maintenance of the memory processing pipeline. This approach makes sense when your memory requirements differ significantly from what the built-in strategies provide.

Measuring Memory Strategy Effectiveness

Implementing memory strategies is only valuable if they improve agent behavior.

The Stanford research evaluated memory effectiveness through believability ratings and behavioral coherence. Enterprise applications require measurable metrics tied to business outcomes and concrete measurement procedures.

Retrieval Quality Metrics

Retrieval relevance measures whether retrieved memories actually contribute to response quality.

Implementation requires weekly sampling of 50-100 agent interactions where you examine the retrieved memories and the agent’s response.

For each interaction, have domain experts rate each retrieved memory as relevant (contributed to response), partially relevant (provided context but not directly used), or irrelevant (unrelated to query).

Calculate the percentage of relevant memories in the top-10 retrieved results.

Target >80% relevance.

Log retrieval inputs/outputs (query, retrieved record IDs/namespaces, and the final response) to S3.

Score distribution reveals whether agents balance retrieval dimensions appropriately. Use CloudWatch Logs Insights to calculate mean scores across dimensions for all retrievals in a time period.

For agents implementing explicit retrieval scoring (for example, with self-managed memory strategies), balanced systems tend to show similar mean values across recency, importance, and relevance after normalization.

Agents over-relying on one dimension show skewed distributions.

For example, if average recency scores are 0.85 while importance and relevance average 0.15 and 0.20, the agent depends too heavily on recency.

Citation usage tracks whether agents incorporate retrieved memories into responses or fall back on generic knowledge.

Implement by parsing agent responses for memory citations or references to past events.

If your agent implementation tracks which memories influenced each response, calculate what percentage of retrieved memories actually get cited.

Target >60% citation rate, which indicates retrieval is surfacing useful context rather than noise.

Behavioral Coherence Metrics

Self-contradiction rate requires comparing agent statements against stored memories to detect logical inconsistencies.

Implement through periodic automated checks that use language models to detect contradictions.

For a sample of agent responses (start with 10%), retrieve similar memories and prompt a language model to identify whether the current statement contradicts any previous statements.

Track contradictions per 100 interactions with a target of less than 2% contradiction rate.

Context awareness measures whether agents incorporate relevant historical context without explicit prompting. Implement through test scenarios where context should influence responses.

Create test cases with historical context stored in memory, then issue queries that should trigger context usage.

Use language model evaluation to assess whether agent responses appropriately incorporate the historical context.

Target >90% context awareness across your test scenarios.

Decision consistency tracks whether agents make similar decisions in similar situations. Implement by identifying repeated scenario types (like billing disputes with similar characteristics) and comparing agent actions.

Group scenarios by similarity using embedding-based clustering, then calculate what percentage of similar scenarios receive consistent decisions.

Target >85% consistency for equivalent situations.

Business Impact Metrics

Task completion rate compares before/after memory strategy implementation by tracking multi-step task success.

Use CloudWatch Logs Insights to analyze task outcomes, filtering for completed versus failed or abandoned tasks.

Compare completion rates between different memory strategy versions, along with average time to completion and number of memory retrievals required.

This reveals whether improved memory strategies help agents complete tasks more effectively.

User satisfaction correlation with retrieval quality requires instrumenting feedback collection and linking to retrieval performance.

For interactions where users provide satisfaction ratings, calculate retrieval quality metrics (average retrieval score, citation rate, memory count) and analyze the correlation with satisfaction scores.

High correlation (>0.6) between retrieval quality and satisfaction indicates that memory strategy improvements translate to better user experience.

Efficiency gains measure whether better memory reduces interaction time or redundant questions.

Track average interaction duration, conversation turns, and redundant questions (asking for information already provided in the session) across different memory strategy versions.

Target a >50% reduction in redundant questions with proper memory retrieval, which demonstrates that agents effectively use stored context instead of repeatedly requesting the same information.

Start with manual sampling for retrieval relevance and context awareness to establish baselines, then automate contradiction detection and decision consistency tracking as you scale.

Patterns From Production: Memory Strategy Lessons

Organizations implementing sophisticated memory strategies with AgentCore have discovered patterns that extend beyond the Stanford research findings:

Domain-Specific Importance Calibration

Generic importance scoring works reasonably well, but domain-specific calibration significantly improves retrieval quality.

Implementation approach: Create a set of 20-50 representative memories spanning the importance spectrum for your domain.

Use these as few-shot examples in the importance scoring prompt.

Periodically review whether importance scores align with domain expert judgment and refine examples accordingly.

Temporal Context Matters

The Stanford research used hourly intervals as time units because their simulation tracked agents through daily routines with clear temporal structure.

Enterprise agents operate across varying temporal scales that affect optimal recency decay rates.

Real-time monitoring agents need aggressive decay (half-life measured in minutes) because events from an hour ago rarely remain relevant.

Customer support agents need moderate decay (half-life measured in hours) because conversations span multiple interactions but complete within a day.

Account management agents need gentle decay (half-life measured in weeks) because relationships and context accumulate over months.

Implementation approach: Start with medium decay rates (half-life of 8 hours for session memory, 30 days for long-term memory), then adjust based on observed retrieval patterns.

If agents over-rely on old context, increase decay rate. If they miss relevant historical context, decrease decay rate.

Reflection Quality Over Frequency

Early AgentCore implementations often triggered reflection too frequently, generating noisy low-quality insights.

High-quality reflection requires sufficient accumulated experience to identify genuine patterns rather than noise.

Frequent reflection on sparse data produces observations dressed as insights (”The customer uses our product” rather than “The customer consistently struggles with feature X despite multiple explanations”).

Implementation approach: Set reflection thresholds high enough that agents accumulate 20-30 meaningful observations before reflecting.

Monitor reflection content quality manually.

Good reflections synthesize patterns across multiple observations and provide actionable insights.

Poor reflections restate individual observations or make unsupported generalizations.

Hybrid Memory Architecture

Pure episodic memory (observations and reflections) works well for Stanford’s simulation, but enterprise agents benefit from hybrid architectures combining episodic memory with semantic knowledge bases and procedural knowledge.

A healthcare agent combines episodic memory (patient interaction history) with semantic memory (medical knowledge base) and procedural memory (clinical protocols).

Retrieval strategies differ across memory types: episodic memory uses recency-importance-relevance scoring, semantic memory uses pure relevance scoring, procedural memory uses task-specific rule matching.

Implementation approach: Use AgentCore session and long-term memory for episodic storage with full retrieval scoring. Integrate knowledge bases through retrieval-augmented generation (RAG) with relevance-only scoring.

Implement procedural knowledge through explicit skill definitions that bypass memory retrieval entirely for deterministic tasks.

Building Agents That Learn From Experience

The Stanford Generative Agents research proved that sophisticated memory strategies transform language model behavior from reactive to genuinely autonomous. Agents with proper memory retrieval, importance scoring, and reflection capabilities develop coherent personalities, form relationships, and exhibit emergent behaviors that feel believable rather than mechanical.

Amazon Bedrock AgentCore provides production-ready memory infrastructure through its two-tier system: short-term memory for session context and long-term memory for automatic insight extraction.

While AgentCore’s semantic search approach differs from the Stanford paper’s explicit recency-importance-relevance scoring, both architectures solve the same fundamental problem: helping agents retrieve the right context at the right time.

Organizations implementing sophisticated memory strategies report measurably better agent performance: higher task completion rates, improved user satisfaction, reduced interaction time, and fewer behavioral inconsistencies.

More importantly, they report agents that feel less like chatbots and more like assistants that genuinely understand context and learn from experience.

Whether you adopt AgentCore’s automatic extraction and semantic search or implement explicit retrieval scoring based on the Stanford research, the core principle remains the same: believable agents need memory systems that distinguish important from mundane, recent from historical, and relevant from tangential.

These capabilities are available today for organizations ready to move beyond stateless chat interfaces toward agents that remember, reflect, and improve.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

Route Claude Code Through AWS Bedrock for CloudTrail Auditing and IAM Control

Marcelo Acosta Cavalero — Tue, 24 Mar 2026 12:02:00 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

Over the past few weeks, Claude Code has gained a lot of attention as a developer tool in the AI space.

With rapid improvements in its capabilities, better context handling, and an increasingly robust feature set, developers are flocking to this powerful CLI tool that brings Claude’s intelligence directly into their terminal workflow.

Whether you’re debugging complex codebases, refactoring legacy systems, or building new features, Claude Code has proven itself as an indispensable coding companion. But with great power comes great responsibility, and potentially significant API costs.

Why Route Claude Code Through AWS Bedrock?

If you’re already using Claude Code, you might be consuming the Anthropic API directly.

While this works perfectly fine, there are compelling reasons to route your Claude Code traffic through AWS Bedrock instead:

1. Cost Control and Transparency

AWS Bedrock provides granular billing through AWS Cost Explorer.

You can track AI spending alongside your other AWS services, set up billing alerts and budgets, and analyze usage patterns with detailed metrics.

This visibility enables better cost management compared to direct API billing.

AWS enterprise customers can also take advantage of committed use pricing and volume discounts that apply across their entire AWS footprint, potentially reducing AI infrastructure costs significantly.

2. Security and Compliance

For enterprises and security-conscious teams, Bedrock offers substantial advantages. Requests are made to Bedrock under your AWS account with IAM governance, CloudTrail auditing, and optional PrivateLink connectivity.

This provides complete visibility into who invoked which models and when, helping meet compliance requirements that mandate audit trails and access controls.

Every API call gets logged through CloudTrail, and you can leverage AWS IAM for fine-grained access control.

Organizations can also use AWS PrivateLink to keep API traffic off the public internet, simplifying governance and network security posture.

3. Observability

Bedrock integration provides comprehensive observability through CloudWatch metrics that track invocation counts, latency, and errors.

CloudTrail logs capture complete audit trails of every model invocation.

You can integrate these logs with your existing AWS monitoring stack, whether that’s CloudWatch dashboards, third-party tools, or custom alerting systems.

This allows you to set up alerts on usage patterns, detect anomalies, and troubleshoot issues using the same tools you already use for your AWS infrastructure.

4. Unified Cloud Strategy

Organizations already running infrastructure on AWS gain additional benefits from using Bedrock. Centralized billing consolidates AI costs with compute, storage, and other services, simplifying cost allocation and budgeting.

You get a single pane of glass for all cloud services rather than managing multiple vendor relationships.

This simplifies vendor management and allows you to leverage existing AWS support contracts and enterprise agreements for your AI infrastructure as well.

The Configuration Process

The good news?

Configuring Claude Code to use Bedrock is remarkably straightforward.

The changes are global, affecting all your projects and sessions once configured.

Prerequisites

Before you begin, ensure you have:

AWS CLI installed and configured with valid credentials
Claude Code CLI installed (recent version recommended)
AWS Bedrock model access enabled in your target region via the Bedrock console (some models require approval depending on region and account type)
Appropriate IAM permissions for bedrock:InvokeModel, bedrock:InvokeModelWithResponseStream, and bedrock:ListInferenceProfiles

Step 1: Verify AWS Credentials

First, confirm your AWS CLI is properly configured:

aws sts get-caller-identity
You should see output like:

{
    "UserId": "AIDAXXXXXXXXXXXXXXXXX",
    "Account": "123456789012",
    "Arn": "arn:aws:iam::123456789012:user/your-username"
}

Step 2: Set Environment Variables

The configuration happens through environment variables.

Add these to your shell configuration file (~/.zshrc, ~/.bashrc, or ~/.bash_profile):

# Enable Bedrock for Claude Code
export CLAUDE_CODE_USE_BEDROCK=1

# Set your preferred AWS region (REQUIRED - Claude Code does not read from ~/.aws/config)
export AWS_REGION=us-east-1

After adding these lines, reload your shell configuration:

source ~/.zshrc # or ~/.bashrc

Step 3: Verify the Configuration

Check that the environment variables are set:

env | grep -E "CLAUDE_CODE_USE_BEDROCK|AWS_REGION"
Expected output:

CLAUDE_CODE_USE_BEDROCK=1
AWS_REGION=us-east-1

That’s it! No per-project configuration needed.

These environment variables tell Claude Code to route all LLM requests through AWS Bedrock’s API instead of directly to Anthropic.

Understanding the Scope

Important: This configuration is global and session-based, not project-specific.

✅ Affects all Claude Code sessions started after setting the variables
✅ Works across all directories and projects
✅ No need to configure individual projects
⚠️ Only applies to new terminal sessions (existing sessions need to be restarted)
⚠️ If you unset the variables, Claude Code reverts to direct Anthropic API usage

You do not need to:

Add configuration files to each project
Modify any project-specific settings
Change your Claude Code commands or workflow
Update your .claude.json file

The environment variables are detected automatically when Claude Code initializes, and all API traffic is transparently routed through Bedrock.

Verification Methods: Proving It Works

Now comes the crucial part: verifying that your configuration is actually working and that you’re being charged through AWS Bedrock instead of the Anthropic API.

Method 1: Environment Variable Check (Quick Verification)

While Claude Code is running, verify the environment:

env | grep -E "CLAUDE_CODE_USE_BEDROCK|AWS_REGION"
You should see:

CLAUDE_CODE_USE_BEDROCK=1
AWS_REGION=us-east-1

These are the only two variables that enable Bedrock routing.

You still need valid AWS credentials (default or via AWS_PROFILE/SSO).

For definitive verification, use CloudTrail logs (Method 2 below).

Method 2: CloudTrail Audit Logs (Definitive Proof)

This is the most reliable verification method. CloudTrail logs every Bedrock API call:

# Check for Bedrock API calls from your user in the last hour
# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"
aws cloudtrail lookup-events \
  --region us-east-1 \
  --lookup-attributes AttributeKey=Username,AttributeValue=your-iam-username \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%S')" \
  --query 'Events[?contains(EventSource, `bedrock`)].[EventTime,EventName,EventSource]' \
  --output table

Note: If you use assumed roles or AWS SSO, the Username filter may not work.

In that case, filter by EventSource only:

aws cloudtrail lookup-events \
  --region us-east-1 \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%S')" \
  --query 'Events[?contains(EventSource, `bedrock`)].[EventTime,EventName,EventSource]' \
  --output table

If Claude Code is using Bedrock, you’ll see InvokeModel or InvokeModelWithResponseStream events (streaming sessions typically use the latter):

|  2026-01-14T11:05:48|InvokeModelWithResponseStream|bedrock.aws...  |
|  2026-01-14T11:04:23|InvokeModelWithResponseStream|bedrock.aws...  |
|  2026-01-14T11:04:21|InvokeModelWithResponseStream|bedrock.aws...  |

To extract the specific models being invoked:

# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"
aws cloudtrail lookup-events \
  --region us-east-1 \
  --lookup-attributes AttributeKey=Username,AttributeValue=your-iam-username \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%S')" \
  --query 'Events[?contains(EventName, `InvokeModel`)] | [0:3]' \
  --output json | \
  python3 -c "
import sys, json
events = json.load(sys.stdin)
for e in events:
    details = json.loads(e['CloudTrailEvent'])
    model = details.get('requestParameters', {}).get('modelId', 'N/A')
    print(f\"Time: {e['EventTime']}\")
    print(f\"Model: {model}\")
    print('---')
"

Note: Depending on the event shape, the model identifier may appear under requestParameters.modelId or a related field.

Expected output showing Claude models:

Time: 2026-01-14T11:05:48-03:00
Model: us.anthropic.claude-sonnet-4-5-20250929-v1:0
---
Time: 2026-01-14T11:04:23-03:00
Model: us.anthropic.claude-haiku-4-5-20251001-v1:0
---

Note: Model IDs may vary depending on your configuration.

The default primary model is global.anthropic.claude-sonnet-4-5-20250929-v1:0, but regional inference profiles (like us.anthropic...) may also appear based on your setup. Both indicate Bedrock usage.

Method 3: Count API Calls

Get a quick count of how many Bedrock calls you’ve made:

# Note: For Linux, replace "date -u -v-1H" with "date -u -d '1 hour ago'"
aws cloudtrail lookup-events \
  --region us-east-1 \
  --lookup-attributes AttributeKey=Username,AttributeValue=your-iam-username \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%S')" \
  --query 'Events[?contains(EventName, `InvokeModel`)]' \
  --output json | \
  python3 -c "import sys, json; print(f'Total Bedrock API calls: {len(json.load(sys.stdin))}')"

Method 4: CloudWatch Metrics

Check aggregated metrics for specific models:

# Note: For Linux, replace "date -u -v-1d" with "date -u -d '1 day ago'"
aws cloudwatch get-metric-statistics \
  --namespace AWS/Bedrock \
  --metric-name Invocations \
  --dimensions Name=ModelId,Value=us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
  --start-time "$(date -u -v-1d '+%Y-%m-%dT%H:%M:%S')" \
  --end-time "$(date -u '+%Y-%m-%dT%H:%M:%S')" \
  --period 3600 \
  --statistics Sum \
  --region us-east-1

Output shows invocation counts:

{
    "Label": "Invocations",
    "Datapoints": [
        {
            "Timestamp": "2026-01-14T13:07:00+00:00",
            "Sum": 19.0,
            "Unit": "Count"
        }
    ]
}

Method 5: AWS Cost Explorer (Delayed, but Comprehensive)

Check your Bedrock costs through Cost Explorer. Note that costs typically appear with a 24-48 hour delay:

# Note: For Linux, replace "date -v-2d" with "date -d '2 days ago'"
aws ce get-cost-and-usage \
  --time-period Start=$(date -v-2d +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics UnblendedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --filter '{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Bedrock"]}}'

Method 6: Check Anthropic Console (Negative Verification)

As a final check, log into your Anthropic console at

https://console.anthropic.com

and check your API usage dashboard. If you see no recent API calls corresponding to your Claude Code sessions, it confirms traffic is going through Bedrock instead.

Troubleshooting

If verification shows no Bedrock traffic:

Check environment variables in the active session:

echo $CLAUDE_CODE_USE_BEDROCK
echo $AWS_REGION

Restart your terminal after setting environment variables

Verify AWS credentials are valid:

aws sts get-caller-identity

Check IAM permissions for bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream actions
Ensure Bedrock model access is enabled in AWS Console (us-east-1 → Bedrock → Model Access)
Review CloudTrail for AccessDenied events that might indicate permission issues

Cost Implications

Bedrock pricing for Anthropic models has two distinct tiers depending on model generation.

Legacy models (Public Extended Access)

Claude 3.5 Sonnet moved to Public Extended Access pricing as of December 2025, increasing from $3/$15 to $6/$30 per million tokens. If you are still running workloads on these older models, migrating to Claude Sonnet 4.5 gives you better performance at a lower price point.

Claude 3.5 Sonnet v2 (also under Public Extended Access) is priced the same at $6.00 input / $30.00 output per million tokens on-demand, with batch at $3.00 / $15.00. It additionally supports prompt caching: $7.50 per million for cache writes and $0.60 per million for cache reads.

Current generation models

Claude Sonnet 4.5 on Bedrock is priced at $3.00 per million input tokens and $15.00 per million output tokens in us-east-1. This is significantly cheaper than the legacy Sonnet 3.5 extended access pricing for equivalent capability.

Starting with Claude Sonnet 4.5 and Haiku 4.5, AWS Bedrock offers two endpoint types: global endpoints for dynamic routing across regions, and regional endpoints with a 10% premium for data residency requirements.

For exact Haiku 4.5 and Opus 4.5 pricing, check the AWS Bedrock console directly as rates can vary by region and are updated more frequently than third-party guides.

Pricing modes that affect your bill

All current Claude models support batch inference at a 50% discount, useful for asynchronous workloads like document processing or data enrichment where real-time responses are not required.

Prompt caching can reduce costs substantially for workloads that reuse the same context repeatedly. The 1-hour TTL option for prompt caching launched in January 2026 for Claude Sonnet 4.5, Haiku 4.5, and Opus 4.5.

Intelligent Prompt Routing can automatically route requests between models in the same family based on prompt complexity, reducing costs by up to 30% without compromising accuracy. This works well for customer service workloads where simple queries can be handled by a smaller model and complex ones escalated automatically.

Always verify current rates at aws.amazon.com/bedrock/pricing before budgeting, as prices vary by region and are updated periodically.

Taking Control of Your AI Infrastructure

Routing Claude Code through AWS Bedrock provides tangible benefits in cost control, security, and observability without adding complexity to your workflow.

The configuration is global, simple, and transparent to your development process.

The verification methods outlined above give you confident confirmation that your AI traffic flows through Bedrock, allowing you to take advantage of AWS’s robust cloud infrastructure for your AI workloads.

CloudTrail audit logs provide irrefutable proof of where your API calls are going.

As Claude Code continues to evolve and become more central to development workflows, having this level of control and visibility over your AI infrastructure becomes increasingly valuable.

The ability to audit, monitor, and manage AI costs through the same tools you use for the rest of your infrastructure creates operational efficiency that compounds over time.

Have you configured Claude Code with Bedrock? What benefits have you seen? Share your experience in the comments below.

I publish every week at buildwithaws.substack.com. Subscribe. It's free.

What a Multimodal WhatsApp Agent Looks Like on AWS

Marcelo Acosta Cavalero — Mon, 23 Mar 2026 12:00:00 +0000

Originally published on Build With AWS. Subscribe for weekly AWS builds.

I watched Miguel Otero Pedrido and Jesus Copado’s brilliant Ava the WhatsApp Agent series and tried building something similar. They built a multimodal WhatsApp bot using LangGraph and Google Cloud Run. The agent could hold conversations, analyze images, generate art, and process voice messages.

After going through the series, I had one question: what would this look like built 100% on AWS?

I started sketching out the architecture and quickly realized there were too many ways to build it. Pure Lambda orchestration? Bedrock Agents? Bedrock AgentCore? LangChain on Lambda? Step Functions? Each approach had tradeoffs I couldn’t ignore.

That’s when I decided to build a hybrid system. Not because hybrid is always better, but because building both patterns side by side would force me to understand when each approach makes sense.

The result is a production-ready WhatsApp bot on a manageable budget that demonstrates two distinct architectural patterns in the same codebase. You can find the complete code and deployment scripts at github.com/marceloacosta/multimodal-whatsapp-bot-aws to try it yourself.

What You’ll Build

By the end of this guide, you’ll understand how to build a WhatsApp bot with:

Natural conversations powered by Claude 3.5 Sonnet
Image analysis using Claude Vision
AI image generation with Stable Diffusion XL (or Amazon Titan)
Voice message transcription with AWS Transcribe
Text-to-speech responses using Amazon Polly
A serverless architecture that scales automatically

More importantly, you’ll understand when to use direct Lambda processing versus Bedrock Agent frameworks.

Why Hybrid Architecture?

Most tutorials show you one approach and call it done. I’m showing you both because the “best” architecture depends on what you’re building.

Here’s the reality: simple operations don’t need the complexity of agent frameworks. Complex operations benefit from them. I learned this the hard way after rebuilding parts of this system three times.

The project uses direct Lambda functions for straightforward tasks like image analysis, text-to-speech, and transcription. These are deterministic operations that don’t need natural language understanding or multi-turn conversations.

For image generation, I use Bedrock Agents. Why? Because turning “create a sunset over mountains” into an optimized prompt for an image model requires natural language understanding and prompt engineering. An agent handles this better than hardcoded logic.

This approach saves money where agents would be overkill, and uses them where they add real value.

The Cost Reality Check

Before we dive deeper, here’s what running this bot actually costs:

For 1,000 messages per day:

Lambda execution: $5-10
Bedrock models: $20-30
S3 storage: $1-2
API Gateway: $1
Other services: $3-5

Total: $30-50 per month.

Image generation adds extra cost per image. Titan costs $0.01 per image, Stable Diffusion XL costs $0.04. These costs scale with usage, but you have full control over which model you use.

Paying only for what you use across AWS services often beats being locked into third-party platforms with mandatory monthly fees.

Architecture Overview

The system consists of 8 Lambda functions working together:

Entry and orchestration:

inbound-webhook: Receives WhatsApp messages via API Gateway
wa-process: Main orchestrator that routes requests
wa-send: Sends messages back to WhatsApp

Feature handlers:

wa-image-analyze: Analyzes images using Claude Vision
wa-image-generate: Generates images using Titan or Stable Diffusion
wa-tts: Converts text to speech with Amazon Polly
wa-audio-transcribe: Starts transcription jobs using AWS Transcribe
wa-transcribe-finish: Handles transcription callbacks

Supporting services:

AWS Bedrock: Supervisor Agent + ImageCreator Sub-Agent
Amazon Polly: Text-to-speech synthesis
AWS Transcribe: Audio transcription
S3 buckets: Media storage and generated images
Secrets Manager: WhatsApp API credentials

The architecture diagram shows the complete flow, but I’ll walk you through how each piece works and why I made specific decisions.

Decision Framework: Lambda vs Agents

Here’s how I decided which approach to use for each feature.

Use direct Lambda when:

The operation is deterministic (TTS always works the same way)
You’re calling an AWS service directly (Transcribe, Polly)
The input-output relationship is simple
You want lower latency and cost

Use Bedrock Agents when:

You need natural language understanding
The task requires reasoning or optimization
Multi-turn conversations matter
Context needs to persist across interactions

Image analysis went to Lambda. The operation is simple: take an image, send it to Claude Vision, return the description. No complex prompt engineering needed.

Image generation went to Agents. User requests like “sunset” need to become detailed prompts like “a photorealistic sunset over mountain peaks with golden hour lighting, highly detailed, 8k resolution.” The agent handles this transformation.

The goal isn’t to pick a winner, but to match each method to what it does best.

Building the Foundation

Let’s start with the basics. You’ll need:

AWS account with Bedrock access
Python 3.9 or higher
AWS CLI configured
WhatsApp Business API account from Meta for Developers

You also need to enable model access in Bedrock for:

Claude 3.5 Sonnet v2
Claude 3.5 Haiku
Titan Image Generator v2

Model access is free to enable. You only pay when you use them.

Setting Up WhatsApp Business API

Getting WhatsApp access is straightforward but takes a few steps:

Go to Meta for Developers and create an app
Add the WhatsApp product to your app
Get your Phone Number ID and Access Token
Generate a verify token (any random string you choose)

Store the long-lived access token in AWS Secrets Manager. This is important because this token needs rotation over time.

Create a secret with this structure:

{ “token”: “your_long_lived_access_token” }
The Phone Number ID and Verify Token go in Lambda environment variables. Only the access token needs to be in Secrets Manager because it’s the credential that requires rotation and is security-sensitive.

The Configuration Strategy

Lambda functions don’t use .env files. Each function has its own environment variables set directly in AWS Console or via CLI.

The env.example file in the repo is just a reference document showing what variables exist and where they’re used. Different Lambda functions need different configurations. The orchestrator needs agent IDs. The image generator needs model IDs and bucket names. The sender only needs to know where to find the access token in Secrets Manager.

This keeps each function’s configuration minimal and explicit

Building the Entry Point

Every WhatsApp message hits inbound-webhook first. This Lambda handles two responsibilities: webhook verification and receiving messages.

The verification flow is straightforward. WhatsApp sends a GET request with a challenge token. The Lambda verifies the token matches what you configured, then returns the challenge back. This proves you control the endpoint.

After verification passes, WhatsApp starts sending POST requests with message data. When media arrives (images, audio), the webhook downloads it to S3 for processing. Then it invokes wa-process asynchronously.

The async pattern is critical. WhatsApp expects a 200 response within seconds. Your bot might take 10-20 seconds to generate a response. Async invocation lets you acknowledge receipt immediately while processing happens in the background.

Building the Orchestrator

The wa-process Lambda is the brain of the system. It receives a message and decides what to do with it.

The logic follows a simple flow: identify message type (text, image, audio), check for special intents like voice responses, route to the appropriate handler, and send the response back.

For text messages, the function invokes the Bedrock Supervisor Agent and sends the response directly. For images with questions, it prepares context that includes the S3 URI and user’s question, then invokes the agent. For audio, it triggers the transcription Lambda and waits for the callback.

The hybrid architecture shows its value here. The orchestrator doesn’t care whether a feature uses direct Lambda calls or agent frameworks. Text and image analysis go through the agent. Audio transcription calls a Lambda directly. Image generation gets delegated to a sub-agent. The orchestrator just routes requests to the right place.

The orchestrator also handles voice response requests. When a user asks for a voice message, it sets a flag and invokes the agent to generate text. Once the agent responds, it calls wa-tts to convert that text to audio. This separation keeps the agent focused on content generation while the orchestrator manages output formats.

Direct Lambda Pattern: Image Analysis

Image analysis shows the direct Lambda approach clearly. The operation is simple: download an image from S3, send it to Claude Vision via the Bedrock Converse API, and return the description.

The Lambda downloads the image bytes from S3 rather than passing an S3 reference. This makes the code more resilient to API changes. The image bytes and the user’s question get sent to Claude 3.5 Sonnet Vision, which returns a description.

This direct approach gives you full control. No agent orchestration, no prompt optimization, just a straightforward API call. The entire Lambda executes in under 3 seconds.

The cost is predictable: $0.008 per image analyzed. At 1,000 images per month, that’s $8. The agent framework would add orchestration overhead without adding value for this use case.

When would you add an agent layer? When the image analysis needs to trigger other actions, maintain conversation context across multiple images, or integrate with knowledge bases. For straightforward “analyze this image” requests, direct Lambda is the better choice.

Direct Lambda Pattern: Voice and Audio

Text-to-speech and audio transcription follow the same direct Lambda pattern.

For TTS, the wa-tts Lambda receives text from the orchestrator and calls Amazon Polly to synthesize speech. Polly returns an MP3 audio stream, which gets uploaded to S3. The Lambda generates a presigned URL for the audio file and returns it to the orchestrator. The orchestrator then calls wa-send with that audio URL to deliver it to WhatsApp. The entire operation costs about $0.016 per request (Polly charges $16 per 1 million characters).

Audio transcription is more complex because AWS Transcribe is asynchronous. You can’t just call an API and get the result immediately.

The wa-audio-transcribe Lambda starts a transcription job. It tells Transcribe where to find the audio file in S3 (uploaded earlier by the webhook), what format it’s in (usually OGG for WhatsApp voice notes), and where to store the results. Then it returns immediately.

AWS Transcribe processes the audio in the background. When finished, it writes the transcript JSON to S3. This triggers an S3 ObjectCreated event that invokes the wa-transcribe-finish Lambda. This Lambda reads the transcript from S3, extracts the text, and sends it back to the orchestrator as if it were a new text message. The orchestrator then sends it to the agent for processing.

This async pattern is crucial for long-running operations. WhatsApp users expect quick responses, but transcription can take 30-60 seconds depending on audio length. The callback pattern lets the user know their message was received while processing happens in the background.

Agent Framework Pattern: Conversations

Now let’s look at the agent side. The Supervisor Agent handles all text conversations.

The agent instructions require quite a bit of thought. You need to balance several competing concerns: natural conversation flow, WhatsApp’s messaging constraints, multi-language support, and managing different output formats.

The instructions need to handle language detection and matching. Users might write in Spanish, English, or Portuguese. The agent needs to detect this and respond appropriately. This is straightforward for text but becomes tricky when you add voice responses.

For voice responses, there’s a subtle problem. If a user asks for an audio message and the agent says “I’ll send you an audio message about quantum physics,” the TTS system converts that entire sentence to audio. The user hears “I’ll send you an audio message about quantum physics” instead of just hearing about quantum physics. The solution is explicit instructions: never mention the output format, just generate the content. The backend handles format conversion.

The instructions also need to consider WhatsApp’s messaging patterns. Long paragraphs work poorly in chat. The agent needs to keep responses concise while still being helpful. This means being explicit about brevity without sacrificing accuracy.

Benefits of this approach: the agent focuses on content generation, not infrastructure concerns. You can add new output formats (video captions, PDFs) without changing agent instructions. The separation between content and delivery is clean.

Drawbacks: the instructions become longer and more specific. More specific instructions mean less flexibility for the agent to adapt to edge cases. You also need to test thoroughly because the agent won’t tell you when it’s confused about format handling.

The agent connects to Lambda functions via action groups. For image analysis, the action group defines a function with parameters for the S3 URI, optional question, and optional language code. When a user sends an image with a question, the orchestrator formats it as a structured context block with these parameters. The agent parses this, calls the analyzeImage action, and returns the result.

This separation is powerful. You can change how image analysis works (switch models, add caching, implement fallbacks) without touching the orchestrator or agent instructions. The interface stays stable while the implementation evolves.

Agent Framework Pattern: Image Generation

Image generation shows why agents matter for complex tasks. When a user says “create a sunset,” that vague request needs to become a detailed prompt like “a photorealistic sunset over mountain peaks with golden hour lighting, vibrant orange and purple clouds, highly detailed, 8k resolution.” This transformation requires natural language understanding and prompt engineering, which agents handle well.

The architecture uses a sub-agent pattern. The Supervisor Agent detects image generation requests and delegates to an ImageCreator sub-agent. This keeps responsibility focused: the supervisor handles routing decisions, the sub-agent handles prompt optimization, and the Lambda handles the actual image generation.

The ImageCreator sub-agent analyzes the user’s natural language request and creates an optimized prompt for the image model. It considers style preferences, adds quality modifiers, and constructs negative prompts to avoid common issues. Then it calls the wa-image-generate Lambda through an action group.

The Lambda receives the optimized prompt and calls the configured Bedrock image model (Stable Diffusion XL or Titan). The generated image gets uploaded to S3, a presigned URL is created, and the Lambda uses Claude Haiku to generate a natural caption in the user’s language. Finally, it invokes wa-send to deliver the image to WhatsApp with the caption.

The sub-agent responds with a simple success indicator to the supervisor, which passes it back to the orchestrator. The orchestrator knows the image was already sent directly by the Lambda, so it doesn’t send anything else.

This multi-layer delegation (orchestrator → supervisor → sub-agent → Lambda) seems complex, but each layer has a clear purpose. The orchestrator routes by message type. The supervisor manages conversation context. The sub-agent optimizes prompts. The Lambda generates images. Each component does one thing well.

The Configuration Pattern

Earlier I mentioned environment variables are set per-Lambda. Here’s the complete pattern:

Secrets Manager (long-lived token only):

WhatsApp access token (needs rotation, security-sensitive)

Lambda environment variables (function-specific):

wa-process: Agent IDs, region, function names
wa-image-generate: Model IDs, bucket names
inbound-webhook: Bucket names, verify token, downstream functions
wa-send: Phone number ID, secret name

This approach scales better than shared configuration. Each function only knows what it needs. Changes to one function don’t affect others.

Setting these via CLI looks like:

`aws lambda update-function-configuration \

--function-name wa-process \

--environment Variables=’{

“BEDROCK_AGENT_ID”:”AGENTXXX”,

“BEDROCK_AGENT_ALIAS_ID”:”ALIASXXX”,

“BEDROCK_REGION”:”us-east-1”,

“MEDIA_BUCKET”:”my-media-bucket”

}’`
Or use the AWS Console for easier management. Both approaches work.

Deployment Strategy

The repo includes automated deployment scripts that handle the entire setup. Understanding what happens during deployment helps when debugging issues later.

Lambda deployment involves several steps: packaging the code, creating the function with the right runtime and memory settings, configuring environment variables, and setting up triggers. Each function needs different timeout and memory configurations. The webhook and orchestrator need quick response times. Image generation needs more time and memory. Audio transcription is somewhere in between.

The deployment scripts handle creating IAM roles with appropriate permissions. Each Lambda gets least-privilege access: only the specific AWS services it needs. The image analyzer reads from S3 but doesn’t write. The image generator writes to S3 but doesn’t read user data. The orchestrator invokes other Lambdas but doesn’t access S3 directly.

Triggers need configuration too. API Gateway triggers the webhook Lambda on HTTP requests. S3 ObjectCreated events trigger the transcription finish Lambda. Other Lambdas get invoked directly by other functions, so they don’t need external triggers.

The critical piece many people miss: Bedrock Agents need explicit permission to invoke Lambda functions. AWS doesn’t automatically grant this. You must add a resource-based policy to each Lambda that allows the bedrock.amazonaws.com service principal to invoke it, scoped to your specific agent ARN. Without this permission, the agent fails silently with generic error messages like “I cannot help with that.”

The automated scripts handle all these details, but knowing what they do helps when something goes wrong. If an agent can’t invoke a Lambda, check the resource policy. If a Lambda times out, check the timeout setting. If environment variables are missing, check the function configuration.

Setting Up Bedrock Agents

Creating agents through the AWS Console is straightforward but has specific steps.

For the Supervisor Agent:

- Go to Bedrock Console → Agents → Create Agent
- Name it descriptively (I use whatsapp-supervisor-agent)
- Choose Claude 3.5 Sonnet v2 as the foundation model
- Copy instructions from supervisor-agent-instructions.txt
- Add action group for image analysis
- Prepare the agent (this compiles everything)
- Create an alias pointing to the prepared version That last step trips people up. Changes to an agent don’t take effect until you:
- Prepare the agent (creates a new version)
- Update the alias to point to the new version

If you change instructions and skip these steps, your bot still uses the old version.

For the ImageCreator sub-agent:

- Create another agent with a focused name
- Use simpler instructions (it has one job)
- Add action group with the OpenAPI schema from lambdas/wa-image-generate/openapi-schema.json
- Prepare and create alias

Then link them:

- Edit the Supervisor Agent
- Add ImageCreator as a collaborator
- Specify when to delegate (image generation requests)
- Prepare the supervisor again
- Update its alias

The supervisor now knows to call the sub-agent for image requests.

Image Generation Models

The system supports two image generation models through a single Lambda function. You choose which model to use by setting the IMAGE_MODEL_ID environment variable.

VISION_MODEL_ID = os.environ.get(”VISION_MODEL_ID”, “us.anthropic.claude-3-5-sonnet-20241022-v2:0”)

Stable Diffusion XL is the default. It offers more creative control with style presets and costs about $0.04 per image. Amazon Titan Image Generator v1 is the alternative, optimized for photorealistic output at about $0.01 per image.

The Lambda detects which model is configured and uses the appropriate API format. Each model has different input parameters and response structures, but the Lambda abstracts these differences. From the agent’s perspective, image generation works the same way regardless of which model you choose.

To switch models, you update the Lambda’s environment variable in AWS Console or via CLI. The benefit of this design is that only the one Lambda changes. The orchestrator, agents, and other Lambdas continue working without modification. The abstraction layer handles the model-specific differences.

Performance Optimization

Lambda cold starts matter for user experience. When a function hasn’t run recently, AWS needs to initialize it. This adds 1-3 seconds of latency.

This demo doesn’t use provisioned concurrency to keep costs minimal. For production deployments with consistent traffic, consider provisioned concurrency for the webhook and orchestrator functions. These are in the critical path for response time. Other functions can tolerate cold starts because they’re not user-facing or run asynchronously.

Agent response time varies based on complexity. Simple text responses take 2-4 seconds. Image generation requests take 10-15 seconds total (agent reasoning + generation + upload).

For audio transcription, the system can send an immediate acknowledgment, then delivers the actual transcription when ready. This manages user expectations for the longer processing time.

Security Considerations

The system has several security layers.

Webhook verification ensures only WhatsApp can send messages. Without the correct verify token, requests are rejected.

IAM roles follow least privilege. Each Lambda only has permissions for the specific AWS services it needs. The image analyzer can read from S3 but not write. The image generator can write but not read others’ images.

Secrets Manager handles credential rotation. The WhatsApp access token can be rotated without code changes. Lambda functions fetch the current token at runtime.

S3 buckets are private by default. Images are shared via presigned URLs that expire after 7 days. No public bucket access.

What’s missing? Content moderation. The current implementation doesn’t filter generated images or user prompts. For production use, add:

Bedrock Guardrails to filter inappropriate prompts
Image scanning before sending to users
Rate limiting per user
Cost monitoring and alerts

These additions depend on your specific requirements and risk tolerance.

Lessons learned

I rebuilt parts of this system three times. Here’s what I learned:

Agent instructions require precision. Vague instructions lead to unpredictable behavior. The voice response handling needed explicit rules about never mentioning the output format. Language detection needed clear fallback behavior. Each edge case required specific handling in the instructions.

Hybrid architecture balances trade-offs. Pure agent systems cost more and respond slower for simple operations. Pure Lambda systems require writing all the conversational logic yourself. The hybrid approach uses agents where their natural language capabilities add value and direct Lambdas where they don’t.

Async patterns matter for user experience. WhatsApp users expect quick acknowledgments. Transcription takes 30-60 seconds. Image generation takes 10-15 seconds. The async callback patterns let the system respond immediately while work happens in the background.

Component isolation simplifies debugging. Each Lambda has a single responsibility. When something breaks, you can test that Lambda independently. Clear interfaces between components mean changes don’t cascade unexpectedly.

Permission issues cause silent failures. Bedrock Agents fail with generic error messages when they can’t invoke Lambdas. IAM permission debugging takes time. Checking permissions early when something doesn’t work saves troubleshooting time later.

Alternative Approaches

This hybrid architecture is one way to build this system. Here are alternatives and when to use them.

Pure Lambda orchestration: Remove Bedrock Agents entirely. The orchestrator directly calls all functions based on deterministic logic. Simpler and cheaper, but you write all the prompt engineering logic yourself.

Pure Agent architecture: Make everything an agent action group. Image analysis, TTS, transcription all go through the agent. Unified conversational interface with better context management, but higher cost and latency for simple tasks.

Bedrock AgentCore: Use AWS Bedrock AgentCore with your choice of agent framework (LangGraph, CrewAI, LlamaIndex). More infrastructure services like 8-hour runtimes and built-in observability, but requires more architectural decisions upfront.

Agent framework (LangChain, CrewAI): Replace Bedrock Agents with an open-source framework hosted in Lambda. Full control and portability, but you handle state management and dependencies yourself.

Step Functions orchestration: Use AWS Step Functions for workflow management instead of Lambda orchestration. Visual workflows with built-in retry logic, but more services to manage.

The right choice depends on your requirements. The hybrid approach teaches you both patterns so you can decide what works for your use case.

For a detailed comparison with pros, cons, and migration paths, see the ARCHITECTURE_DECISIONS.md document in the repo.

Getting Started

The repo includes automated deployment scripts that handle the Lambda setup. You can deploy everything at once or go function by function to understand each piece. After the Lambda deployment, you’ll create the Bedrock agents through the AWS Console and link them together.

The documentation walks you through both approaches. If you want to understand every component, deploy and test each Lambda individually. If you want to get running quickly, use the automated scripts and dive into specific parts later.

Setting up the agents requires more manual steps. You’ll create the supervisor agent with its conversation instructions, add the action group for image analysis, then create the image creator sub-agent and link it as a collaborator. The agent setup guide includes the exact instructions and parameters for each step.

The code is designed to be adaptable. The hybrid architecture isn’t prescriptive. Want to remove agents and handle everything with Lambda logic? The orchestrator is easy to modify. Want to add new capabilities? Create a Lambda, add it to the orchestrator’s routing logic, and decide whether to call it directly or through an agent action group.

The repo documentation covers deployment details, agent configuration, troubleshooting, and architectural alternatives. Start with what interests you most.

What This Enables

This isn’t just about building a WhatsApp bot. The patterns here apply to many AI applications.

The hybrid architecture shows how to balance simplicity with capability. The agent collaboration pattern shows how to break complex tasks into focused components. The async processing pattern shows how to maintain good user experience with slow operations.

You can adapt these patterns to build:

Telegram or Discord bots with the same backend
Slack integrations with multimodal capabilities
API services that use agents for complex requests
Customer service automation with image support

The serverless foundation means it scales automatically. The AWS services handle infrastructure so you focus on functionality.

Where to Go From Here

If you build something with this architecture, I’d like to hear about it. What worked? What didn’t? What did you change?

The complete code, documentation, and deployment scripts are at github.com/marceloacosta/multimodal-whatsapp-bot-aws. The repo is actively maintained. Issues and pull requests are welcome.

Start with the README for an overview, then dive into the architecture decisions document to understand the tradeoffs. The code includes comments explaining why specific approaches were chosen.

For questions or discussion, you can find me here or on Linkedin. I regularly share updates about AI systems and AWS architecture patterns.

Build something interesting with this. Then share what you learned.

I publish every week on buildwithaws.substack.com. If this was useful, subscribe. It's free.