<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nitesh Reddy Challa</title>
    <description>The latest articles on DEV Community by Nitesh Reddy Challa (@nitesh_reddychalla_d5515).</description>
    <link>https://dev.to/nitesh_reddychalla_d5515</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908044%2F1395f330-a337-4b57-b2fe-67e9a87e055f.png</url>
      <title>DEV Community: Nitesh Reddy Challa</title>
      <link>https://dev.to/nitesh_reddychalla_d5515</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nitesh_reddychalla_d5515"/>
    <language>en</language>
    <item>
      <title>How I Deployed Hermes Agent on AWS</title>
      <dc:creator>Nitesh Reddy Challa</dc:creator>
      <pubDate>Wed, 24 Jun 2026 22:46:01 +0000</pubDate>
      <link>https://dev.to/nitesh_reddychalla_d5515/how-i-deployed-hermes-agent-on-aws-371c</link>
      <guid>https://dev.to/nitesh_reddychalla_d5515/how-i-deployed-hermes-agent-on-aws-371c</guid>
      <description>&lt;p&gt;My EC2 instance has a public IP address. It has zero inbound firewall rules. And yet I can reach my AI agent from my phone on Telegram, pull up a full web workspace in my browser, and run shell commands on it — all without opening a single port, without a VPN, and without SSH.&lt;/p&gt;

&lt;p&gt;The latest version also splits storage deliberately: persistent agent data stays on EFS, while the Hermes install and Python venv moved to the root EBS volume. That change keeps &lt;code&gt;pip install&lt;/code&gt; / &lt;code&gt;hermes update&lt;/code&gt; I/O off EFS and brings always-on infra to a highly predictable &lt;strong&gt;~$35/mo&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's the setup this post is about.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Hermes Agent?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; is an open-source AI agent from Nous Research. It's not a chatbot wrapper. It has persistent memory, skills, a file system, a sandboxed terminal backend, and a full web workspace UI. You point it at a model provider and it runs as a daemon — &lt;code&gt;hermes-gateway&lt;/code&gt; — serving an OpenAI-compatible API.&lt;/p&gt;

&lt;p&gt;The web workspace looks like a proper IDE: chat panel, file browser, terminal, job queue. The Telegram integration is a long-polling bot that connects to the same gateway — no extra server, no webhook, no public URL.&lt;/p&gt;

&lt;p&gt;I wanted this running on AWS, backed by Amazon Bedrock (no API keys to rotate, IAM role handles auth), with my agent's memory surviving instance replacements.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your phone (Telegram)
  └─► Telegram servers ──► hermes-gateway long-poll (outbound HTTPS only)

Your laptop (browser)
  └─► aws ssm start-session ──► SSM port-forward :3000
                                   └─► hermes-workspace (loopback only)

EC2 m7g.medium · public subnet · ZERO inbound SG · dynamic public IP
  │
  ├─ hermes-gateway   :8642  (127.0.0.1 only)
  │     ├─ Bedrock inference via IAM role (no API keys)
  │     ├─ Telegram long-poll (outbound HTTPS)
  │     └─ OpenAI-compatible API
  │
  ├─ hermes-dashboard :9119  (127.0.0.1 only)
  └─ hermes-workspace :3000  (127.0.0.1 only)
  │
  ├── EFS /mnt/efs/hermes  (RETAIN · encrypted · uid=10000 access point)
  │     .env · config.yaml · sessions · skills · SOUL.md · logs · state DBs
  │     ↑ persistent agent data — survives instance replacement
  │
  ├── EBS root volume
  │     /opt/hermes-agent      ← hermes venv (pip I/O stays off EFS)
  │     /opt/hermes-workspace  ← workspace UI
  │
  └── Secrets Manager (hermes/runtime)
        API_SERVER_KEY · TELEGRAM_BOT_TOKEN · TELEGRAM_ALLOWED_USERS

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three CDK stacks, deployed in order:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;th&gt;What it provisions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HermesNetworkStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;VPC (1 AZ), public subnet, IGW, S3 gateway endpoint, security groups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HermesStorageStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;EFS (RETAIN, encrypted, uid=10000 access point), Secrets Manager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;HermesComputeStack&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;EC2 (m7g.medium), IAM (Bedrock-scoped), bootstrap user-data, systemd units&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Security Trick: Zero Inbound Rules
&lt;/h2&gt;

&lt;p&gt;The instinct when deploying anything on AWS is to reach for a private subnet, a NAT Gateway, and VPC interface endpoints. That's the enterprise posture. It's also ~$88/mo in endpoint costs alone before your instance even starts.&lt;/p&gt;

&lt;p&gt;For a personal deployment the actual security boundary is not the subnet type — it's what's listening on the instance.&lt;/p&gt;

&lt;p&gt;All three services bind to &lt;code&gt;127.0.0.1&lt;/code&gt; only. The Security Group has zero inbound rules. The public IP on the instance rejects every connection attempt because there is nothing behind it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# network_stack.py — the entire inbound surface of the instance
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instance_security_group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SecurityGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;InstanceSg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hermes EC2 - zero inbound; egress via IGW. Admin via SSM.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;allow_all_outbound&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# No add_ingress_rule calls. Ever.
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Admin access is via AWS Systems Manager Session Manager — outbound HTTPS to the SSM service endpoint, no inbound port required. SSM also handles port-forwarding, which is how the workspace reaches your browser.&lt;/p&gt;

&lt;p&gt;Telegram uses long-polling. The gateway opens an outbound connection to Telegram's servers and holds it. Telegram pushes messages down that connection. Again: zero inbound.&lt;/p&gt;

&lt;p&gt;The result: there is no attack surface on the public IP. Shodan can scan it all day.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Trick: EFS for Data, EBS for Code
&lt;/h2&gt;

&lt;p&gt;Persistent &lt;strong&gt;agent data&lt;/strong&gt; — &lt;code&gt;SOUL.md&lt;/code&gt;, skills, session history, state DBs, the &lt;code&gt;.env&lt;/code&gt; with all secrets, the &lt;code&gt;config.yaml&lt;/code&gt; — lives on an EFS volume mounted at &lt;code&gt;/mnt/efs/hermes&lt;/code&gt;. The hermes &lt;strong&gt;binary and venv&lt;/strong&gt; live on the root EBS volume at &lt;code&gt;/opt/hermes-agent&lt;/code&gt; instead.&lt;/p&gt;

&lt;p&gt;Why split? EFS Elastic Throughput charges per GB accessed. Moving the venv to EBS removes that install/update path from EFS, keeping steady-state EFS I/O costs around &lt;strong&gt;~$1/mo&lt;/strong&gt; instead of paying for heavy throughput during dependency updates. See &lt;code&gt;docs/STORAGE.md&lt;/code&gt; for the full reference.&lt;/p&gt;

&lt;p&gt;The EFS has &lt;code&gt;RemovalPolicy.RETAIN&lt;/code&gt;. The access point locks the path to UID 10000. Automatic backups are on with a 35-day window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# storage_stack.py — the persistence layer
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FileSystem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HermesEfs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;encrypted&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;removal_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;RemovalPolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RETAIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# survives cdk destroy
&lt;/span&gt;    &lt;span class="n"&gt;lifecycle_policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LifecyclePolicy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AFTER_30_DAYS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;throughput_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ThroughputMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ELASTIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;enable_automatic_backups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;access_point&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_access_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HermesAccessPointUid10000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/hermes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;create_acl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Acl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;owner_uid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner_gid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;permissions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0750&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;posix_user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PosixUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this means in practice: if the EC2 instance develops a problem, you run &lt;code&gt;cdk deploy&lt;/code&gt; and get a fresh one. The new instance mounts the same EFS, reads the same &lt;code&gt;.env&lt;/code&gt;, reinstalls the venv to EBS via user-data, and all three systemd services start with the agent's full memory intact. No manual data migration, no re-configuration.&lt;/p&gt;

&lt;p&gt;The EC2 root EBS is flagged &lt;code&gt;delete_on_termination=True&lt;/code&gt;. Agent &lt;em&gt;data&lt;/em&gt; is on EFS (RETAIN); install artifacts on EBS are recreated automatically on each deploy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bedrock: No API Keys, IAM Role Does the Work
&lt;/h2&gt;

&lt;p&gt;Hermes connects to Bedrock via the &lt;a href="https://hermes-agent.nousresearch.com/docs/guides/aws-bedrock" rel="noopener noreferrer"&gt;Hermes Bedrock guide&lt;/a&gt;. The EC2 instance has an IAM role scoped to &lt;code&gt;bedrock:InvokeModel&lt;/code&gt;, &lt;code&gt;bedrock:Converse&lt;/code&gt;, and the streaming variants — on specific inference-profile and foundation-model ARNs only.&lt;/p&gt;

&lt;p&gt;No API keys anywhere. No key rotation. If the instance is compromised, the blast radius is bounded to Bedrock inference on two specific models. The role cannot touch S3, DynamoDB, other accounts, or anything else.&lt;/p&gt;

&lt;p&gt;Two models run in this stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;us.anthropic.claude-sonnet-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Primary (all main agent tasks)&lt;/td&gt;
&lt;td&gt;Best reasoning for the price on Bedrock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;us.amazon.nova-lite-v1:0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Auxiliary (5 background slots)&lt;/td&gt;
&lt;td&gt;~85× cheaper than Sonnet for web extraction, vision, summarisation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;us.&lt;/code&gt; prefix is the cross-region inference profile — Bedrock routes to &lt;code&gt;us-east-1&lt;/code&gt;, &lt;code&gt;us-east-2&lt;/code&gt;, or &lt;code&gt;us-west-2&lt;/code&gt; automatically for throughput. You enable both models once in the &lt;a href="https://us-east-1.console.aws.amazon.com/bedrock/home#/modelaccess" rel="noopener noreferrer"&gt;Bedrock Model Access console&lt;/a&gt; and never touch it again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Breakdown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Infra (always-on, us-east-1)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;th&gt;≈ Monthly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EC2 &lt;code&gt;m7g.medium&lt;/code&gt; (Graviton, 2 vCPU / 4 GiB)&lt;/td&gt;
&lt;td&gt;730 hrs × $0.0404/hr&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$29.50&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EBS gp3 root (30 GiB, encrypted)&lt;/td&gt;
&lt;td&gt;venv + workspace on EBS&lt;/td&gt;
&lt;td&gt;$2.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EFS Standard (~64 MB agent data)&lt;/td&gt;
&lt;td&gt;$0.30/GiB-mo storage&lt;/td&gt;
&lt;td&gt;~$0.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EFS Elastic throughput I/O&lt;/td&gt;
&lt;td&gt;venv/deps on EBS; steady-state session/state access only&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$1/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EFS automatic backups&lt;/td&gt;
&lt;td&gt;~$0.05/GiB-mo&lt;/td&gt;
&lt;td&gt;~$0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Secrets Manager&lt;/td&gt;
&lt;td&gt;1 secret × $0.40&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Logs + metrics&lt;/td&gt;
&lt;td&gt;ingestion + custom metrics&lt;/td&gt;
&lt;td&gt;~$2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NAT Gateway / VPC endpoints&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;none&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infra total (always-on)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≈ $35/mo&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No NAT Gateway. No interface VPC endpoints. The EC2 routes outbound directly through the Internet Gateway. That single architectural decision — public subnet, zero-inbound SG instead of private subnet + NAT — is &lt;strong&gt;58% cheaper&lt;/strong&gt; than the equivalent private-subnet setup with six VPC endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stop it when you're not using it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 stop-instances &lt;span class="nt"&gt;--instance-ids&lt;/span&gt; &amp;lt;InstanceId&amp;gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;EC2 compute billing stops immediately, and most EFS data-access I/O should stop with the services. EFS storage, EBS, Secrets Manager, and CloudWatch keep billing at ~$8/mo. When you start it again, SSM is ready in ~60 seconds and all three &lt;code&gt;hermes-*&lt;/code&gt; systemd units restart automatically. No re-bootstrapping, no re-configuration, agent memory fully intact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Floor: ~$8/mo when off. ~$35/mo when always-on.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Bedrock tokens (variable, on top of infra)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Rate&lt;/th&gt;
&lt;th&gt;Typical personal use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.x&lt;/td&gt;
&lt;td&gt;~$3/M in · $15/M out&lt;/td&gt;
&lt;td&gt;$10–50/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nova Lite (aux slots)&lt;/td&gt;
&lt;td&gt;~$0.06/M in · $0.24/M out&lt;/td&gt;
&lt;td&gt;&amp;lt; $2/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  vs. the alternative
&lt;/h3&gt;

&lt;p&gt;ChatGPT Plus is $20/mo. You get no persistent agent filesystem, no terminal backend, no Telegram long-polling, and far less control over where memory and logs live.&lt;/p&gt;

&lt;p&gt;The Hermes setup is more infrastructure to own, but that is the point: you own the memory, the skills, the &lt;code&gt;SOUL.md&lt;/code&gt; that shapes the agent's persona, the logs, and the conversation history. Stop the instance today, redeploy in six months, and the agent picks up from the same EFS-backed state.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup, Briefly
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Enable Bedrock model access&lt;/strong&gt; — one-time in the console, two models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cdk deploy --all&lt;/code&gt;&lt;/strong&gt; — provisions all three stacks; first boot takes 5–8 min (package installs + workspace build)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a Telegram bot&lt;/strong&gt; via &lt;a href="https://t.me/BotFather" rel="noopener noreferrer"&gt;@BotFather&lt;/a&gt;, get your user ID via &lt;a href="https://t.me/userinfobot" rel="noopener noreferrer"&gt;@userinfobot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add the bot token + your user ID&lt;/strong&gt; to Secrets Manager (&lt;code&gt;hermes/runtime&lt;/code&gt;), sync to EFS, restart gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Port-forward `:3000&lt;/strong&gt;` via SSM to reach the web workspace from your laptop
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Access the workspace from your laptop&lt;/span&gt;
aws ssm start-session &lt;span class="nt"&gt;--target&lt;/span&gt; &amp;lt;InstanceId&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--document-name&lt;/span&gt; AWS-StartPortForwardingSession &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--parameters&lt;/span&gt; &lt;span class="s1"&gt;'{"portNumber":["3000"],"localPortNumber":["3000"]}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

open http://localhost:3000

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After step 4, Telegram just works. Message your bot, get a reply. No additional setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me
&lt;/h2&gt;

&lt;p&gt;I started with a private subnet, a NAT Gateway, and VPC interface endpoints for SSM, Bedrock, Secrets Manager, EFS, and CloudWatch. It's what every AWS security guide recommends. It's also ~$88/mo in endpoint costs before a single token is processed.&lt;/p&gt;

&lt;p&gt;The insight that unlocked this architecture: &lt;strong&gt;the security boundary for a personal agent isn't the subnet — it's what's reachable on the instance.&lt;/strong&gt; With zero inbound SG rules and all services bound to loopback, the public IP is inert. SSM and Telegram's long-polling handle the two access patterns (admin shell / bot messages) over outbound HTTPS. No VPN, no bastion, no open ports.&lt;/p&gt;

&lt;p&gt;The most secure design for this use case turned out to be the simplest one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with &lt;a href="https://hermes-agent.nousresearch.com/docs" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; · AWS CDK (Python) · Amazon Bedrock · SSM Session Manager&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>architecture</category>
      <category>security</category>
    </item>
    <item>
      <title>A Production-Shaped Multi-Agent SRE System on Amazon Bedrock AgentCore</title>
      <dc:creator>Nitesh Reddy Challa</dc:creator>
      <pubDate>Fri, 08 May 2026 14:03:23 +0000</pubDate>
      <link>https://dev.to/nitesh_reddychalla_d5515/a-production-shaped-multi-agent-sre-system-on-amazon-bedrock-agentcore-354d</link>
      <guid>https://dev.to/nitesh_reddychalla_d5515/a-production-shaped-multi-agent-sre-system-on-amazon-bedrock-agentcore-354d</guid>
      <description>&lt;p&gt;At 2 AM, your on-call engineer has four browser tabs open: CloudWatch Logs, CloudWatch Metrics, a runbook wiki, and Slack. They are synthesizing evidence manually — and every fragmented minute is MTTR climbing. Building an AI agent to close that gap sounds simple until you realize you are actually wiring a runtime, a JWT-gated API layer, an MCP transport, memory persistence, guardrails, observability, and an evaluation harness. This post walks through a production-shaped &lt;strong&gt;template&lt;/strong&gt; that does that wiring once — so you swap four files and ship your own domain.&lt;/p&gt;

&lt;p&gt;The 7-day demo cost to run the full stack was &lt;strong&gt;$2.11 USD&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this article is:&lt;/strong&gt; A teardown of a fork-and-ship CDK template for multi-agent systems on Bedrock AgentCore. The built-in exemplar is an SRE incident-response system running against &lt;strong&gt;seeded demo fixtures in CloudWatch&lt;/strong&gt; — not real production data. That's intentional: synthetic fixtures prove the pattern works end-to-end so you can swap in your own data sources with confidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To adapt it to your domain:&lt;/strong&gt; 4 file swaps — MCP server, sub-agent, orchestrator prompt, fixtures. Everything else (Runtime, Gateway, Memory, Guardrails, OTEL, eval harness) doesn't move. Jump to Adapting to Your Domain if you want that first.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem: Manual Incident Response Does Not Scale
&lt;/h2&gt;

&lt;p&gt;When an incident fires, three things break down simultaneously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responders gather evidence from disconnected windows (logs, metrics, runbooks)&lt;/li&gt;
&lt;li&gt;Operational knowledge lives in heads and wikis, not in the workflow&lt;/li&gt;
&lt;li&gt;Synthesis happens manually under pressure — inconsistent and slow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fix is a single orchestration path: specialized agents gather evidence in parallel, synthesize once, and return a structured answer. That is what this template implements.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture: Strands Agents-as-Tools on AgentCore
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important distinction:&lt;/strong&gt; This project uses Strands' &lt;em&gt;agents-as-tools&lt;/em&gt; pattern — four sub-agents as in-process &lt;code&gt;@tool&lt;/code&gt; functions inside a single container. This is architecturally different from Amazon Bedrock Agents' managed multi-agent collaboration feature (separate Agent resources wired via &lt;code&gt;AssociateAgentCollaborator&lt;/code&gt;). The trade-off is intentional: agents-as-tools means zero inter-agent network hops, the same call stack, and identical local/deployed behavior. The managed Bedrock Agents approach earns its complexity when you need cross-team ownership or independent release cycles.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Cognito JWT → AgentCore Gateway → AgentCore Runtime (ARM64)
                                                │
                                   Orchestrator (any LLM via Strands)
                          ┌──────────────┬──────────────┬──────────────┐
                     log_analyst  metrics_analyst  runbook_agent  security_auditor
                          │              │              │
                       CW MCP         CW MCP       Lambda MCP
                          └──────────────┴──────────────┘
                                         │
                              CloudWatch Logs + Metrics + DynamoDB
                                         │
                          OTEL → CloudWatch Gen AI Observability
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator holds four sub-agents as &lt;code&gt;tools=[]&lt;/code&gt;. The LLM selects which to call based on their docstrings — no hardcoded dispatch logic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Strands orchestrator — four sub-agents exposed as @tool functions.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orchestrator.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;agent_kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;memory_enabled&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;agent_kwargs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_manager&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_session_manager&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;strands_bedrock_model&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;          &lt;span class="c1"&gt;# swappable — one env var
&lt;/span&gt;        &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;log_analyst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metrics_analyst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runbook_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;security_auditor_agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;agent_kwargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Adapting to Your Domain: Four File Swaps
&lt;/h2&gt;

&lt;p&gt;Everything outside these four paths is domain-agnostic scaffolding — do not touch it:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Swap&lt;/th&gt;
&lt;th&gt;From&lt;/th&gt;
&lt;th&gt;To&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Custom MCP server&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mcp_custom/runbook_server/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mcp_custom/&amp;lt;your_domain&amp;gt;_server/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-agent&lt;/td&gt;
&lt;td&gt;&lt;code&gt;agent/sub_agents/runbook.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;agent/sub_agents/&amp;lt;your_domain&amp;gt;.py&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator prompt&lt;/td&gt;
&lt;td&gt;&lt;code&gt;agent/prompts/orchestrator.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Add one tool entry + one routing rule (additive only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fixtures + eval cases&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;fixtures/scenarios/&lt;/code&gt; + &lt;code&gt;eval/test_cases.jsonl&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Your 3 canonical queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After the four swaps: &lt;code&gt;make test &amp;amp;&amp;amp; make lint&lt;/code&gt; → &lt;code&gt;make phase1-demo-debug&lt;/code&gt; → &lt;code&gt;DOCKER_BUILDKIT=0 make phase4-deploy&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The scaffolding — Runtime, Gateway, Memory, Guardrails, OTEL, eval harness — does not move. See &lt;a href="https://github.com/nitesh-challa/agentcore-multiagent-framework/blob/main/docs/ADAPT.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/ADAPT.md&lt;/code&gt;&lt;/a&gt; for the step-by-step checklist and a worked Jira triage example.&lt;/p&gt;




&lt;h2&gt;
  
  
  Session &amp;amp; Memory Model
&lt;/h2&gt;

&lt;p&gt;AgentCore provides two distinct persistence layers — keeping these separate is important:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;What it stores&lt;/th&gt;
&lt;th&gt;Lifetime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime session (microVM)&lt;/td&gt;
&lt;td&gt;Single invocation&lt;/td&gt;
&lt;td&gt;In-flight context, tool outputs, reasoning trace&lt;/td&gt;
&lt;td&gt;15-min idle / 8-hr max&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;td&gt;Cross-session&lt;/td&gt;
&lt;td&gt;Conversation history (session-window, sliding-window, or long-term summarization)&lt;/td&gt;
&lt;td&gt;Configurable TTL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each invocation runs in a &lt;strong&gt;dedicated microVM&lt;/strong&gt; with isolated CPU, memory, and filesystem. When the session ends, the microVM is terminated and memory is sanitized — no cross-session data contamination, even with non-deterministic AI processes. AgentCore Memory is opt-in (&lt;code&gt;AGENTCORE_MEMORY_ENABLED=true&lt;/code&gt;); the session ID propagates through every OTEL span automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP as Transport and Policy Layer
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;log_analyst&lt;/code&gt; and &lt;code&gt;metrics_analyst&lt;/code&gt; share one CloudWatch MCP server process. Specialization happens through per-agent tool filters — one server, two different tool surfaces, zero duplication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cloudwatch_mcp_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolFilters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Same MCP server, different tool surface per sub-agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;MCPClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uvx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awslabs.cloudwatch-mcp-server@latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_mcp_subprocess_env&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;startup_timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tool_filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# ← the only difference between sub-agents
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runbook server uses a &lt;strong&gt;dual-shape design&lt;/strong&gt; — local &lt;code&gt;stdio&lt;/code&gt; in Phase 1, Gateway-registered Lambda target in Phase 2+. The sub-agent code does not change between modes; only the transport env var changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Step Functions at the Core?
&lt;/h2&gt;

&lt;p&gt;AWS Prescriptive Guidance is explicit: Step Functions handles deterministic, rule-based flows. AgentCore handles AI-native orchestration where the LLM &lt;em&gt;is&lt;/em&gt; the workflow engine. Mixing them at the reasoning layer adds latency without benefit.&lt;/p&gt;

&lt;p&gt;In this template, Step Functions belongs at the &lt;strong&gt;edges&lt;/strong&gt; — nightly eval harness, human-in-the-loop approval flows, infra lifecycle — not between the orchestrator and sub-agents.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Right fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single agent, all tools&lt;/td&gt;
&lt;td&gt;Simplest — context pressure grows as tools scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agents-as-tools (this repo)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single team, one container, LLM routes, local debuggable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2A choreography&lt;/td&gt;
&lt;td&gt;Cross-team ownership, independent release cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Step Functions + agents&lt;/td&gt;
&lt;td&gt;Deterministic outer workflow, AI inner reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Enterprise Security: Three-Layer Least-Privilege Boundary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_gateway&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;gateway_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;protocolType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MCP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;roleArn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role_arn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;authorizerType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CUSTOM_JWT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;authorizerConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customJWTAuthorizer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;discoveryUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_issuer_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_pool_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;allowedClients&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three explicit boundaries, each independently enforced:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;What it prevents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Identity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cognito JWT — &lt;code&gt;discoveryUrl&lt;/code&gt; + &lt;code&gt;allowedClients&lt;/code&gt; validated on every request&lt;/td&gt;
&lt;td&gt;Unauthenticated callers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authorization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gateway IAM service role (&lt;code&gt;roleArn&lt;/code&gt;) scoped to registered targets only&lt;/td&gt;
&lt;td&gt;Lateral movement to unregistered services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transport enforcement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENT_TRANSPORT_MODE=gateway&lt;/code&gt; in the runtime container&lt;/td&gt;
&lt;td&gt;Local stdio bypass in production&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Bedrock Guardrails are wired separately at the model layer (&lt;code&gt;agent/guardrails.py&lt;/code&gt;) and provisioned via CDK (&lt;code&gt;infrastructure/stacks/guardrail_stack.py&lt;/code&gt;) — covering input/output filtering independent of the transport layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest Eval: What the Scores Actually Mean
&lt;/h2&gt;

&lt;p&gt;The AgentCore LLM-as-judge eval runs three scenarios against the deployed runtime:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;GoalSuccessRate&lt;/th&gt;
&lt;th&gt;Helpfulness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;debug_external_dep_01&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;COMPLETED&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.83 — Very Helpful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;debug_external_dep_02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;COMPLETED&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.67 — Moderately Helpful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;debug_external_dep_03&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;COMPLETED&lt;/td&gt;
&lt;td&gt;0.0&lt;/td&gt;
&lt;td&gt;0.67 — Moderately Helpful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error count&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GoalSuccessRate 0.0 is a fixture alignment gap, not a system failure.&lt;/strong&gt; The evaluator matches exact strings ("Stripe," "503," "CircuitBreakerOpen") against agent responses. The agent reasons in natural language ("payment provider," "upstream errors") — the semantics match, the strings don't. Updating &lt;code&gt;expected_markers&lt;/code&gt; in &lt;code&gt;eval/test_cases.jsonl&lt;/code&gt; to match the agent's vocabulary fixes this without touching the agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Helpfulness 0.83&lt;/strong&gt; is the meaningful signal — the LLM judge rated the response as something that would actually help an SRE. The runbook was matched, the mitigation steps were numbered and actionable, and the analysis was coherent.&lt;/p&gt;

&lt;p&gt;Surfacing this gap explicitly rather than hiding it is the point: &lt;strong&gt;partial confidence is a design principle here, not an error state.&lt;/strong&gt; When evidence is unavailable, the system returns &lt;code&gt;[Partial] — data not retrieved&lt;/code&gt; instead of fabricating an answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observed Cost: 7-Day Demo Window
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Approx. cost&lt;/th&gt;
&lt;th&gt;Pricing model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Runtime&lt;/td&gt;
&lt;td&gt;Majority of total&lt;/td&gt;
&lt;td&gt;Consumption-based — billed on active CPU only, not LLM wait time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Gateway&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;Per-request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentCore Memory&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;Storage + retrieval ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bedrock Guardrails&lt;/td&gt;
&lt;td&gt;Small&lt;/td&gt;
&lt;td&gt;Per text-unit processed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognito (Auth)&lt;/td&gt;
&lt;td&gt;Negligible&lt;/td&gt;
&lt;td&gt;MAU-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (7 days)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.11 USD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full stack including all layers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The consumption-based Runtime pricing is the key lever: you are not charged while the container waits on model responses. For SRE use cases where invocations are event-driven (not continuous), the economics are favorable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Strands Agents Over LangChain or CrewAI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-frameworks/strands-agents.html" rel="noopener noreferrer"&gt;Strands Agents&lt;/a&gt; is an open-source SDK published by AWS with first-class AgentCore Runtime integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OTEL built-in&lt;/strong&gt; via ADOT auto-instrumentation — no middleware to configure, spans appear in CloudWatch Gen AI Observability automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Typed &lt;code&gt;@tool&lt;/code&gt; contracts&lt;/strong&gt; — sub-agents are plain Python functions; their docstrings become tool descriptions the LLM uses for routing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP tool filtering&lt;/strong&gt; via a single &lt;code&gt;tool_filters=&lt;/code&gt; kwarg — one server, scoped tool surface per sub-agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model-agnostic&lt;/strong&gt; — swap the model ID in one place (&lt;code&gt;strands_bedrock_model()&lt;/code&gt;); Claude, Nova, and others all work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LangChain and CrewAI are valid choices for different constraint sets. Strands fits here because the target is AgentCore Runtime, not a generic cloud environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The hard part of building agentic systems on AWS is not writing the agent logic — it is wiring runtime, auth, MCP, memory, guardrails, observability, and eval into a coherent system you can actually ship and trust. Every one of those layers is already wired here: microVM session isolation per invocation, Cognito JWT gating, OTEL to CloudWatch Gen AI Observability, LLM-as-judge evaluation via AgentCore's on-demand eval API, and CDK IaC for all infrastructure.&lt;/p&gt;

&lt;p&gt;Fork it. Swap &lt;code&gt;mcp_custom/runbook_server/&lt;/code&gt; for your domain's data source. Update the orchestrator prompt. Ship. The other eleven services do not move.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/nitesh-challa/agentcore-multiagent-framework" rel="noopener noreferrer"&gt;agentcore-multiagent-framework&lt;/a&gt; · &lt;strong&gt;Adapt guide:&lt;/strong&gt; &lt;code&gt;docs/ADAPT.md&lt;/code&gt; · &lt;strong&gt;Run it:&lt;/strong&gt; follow the &lt;strong&gt;First-time deployed demo (recommended path)&lt;/strong&gt; section in &lt;code&gt;README.md&lt;/code&gt; (CDK deploy → token/runtime deploy → seed → demo queries) · &lt;strong&gt;Local-only fallback:&lt;/strong&gt; &lt;code&gt;make phase1-demo-debug&lt;/code&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>architecture</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
